[GitHub] [spark] maropu commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
maropu commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510356646 ur, looks bad I checkd the reserved keywords in Postgresql again; https://www.postgresql.org/docs/current/sql-keywords-appendix.html FALSE is reserved, but the qyery works well in postgresql; ``` postgres=# select 1 as false; false --- 1 (1 row) ``` How about Oracle and the other databases? Can they accept that query, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302389500 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return then plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +val traversedChild = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +lazy val substitutedCTEPlan = traverseAndSubstituteCTE(ctePlan, true) Review comment: None of the tests fail actually, but both lazy and call-by-name are part of the optimization. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the
[GitHub] [spark] beliefer commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax
beliefer commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r302389418 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/booleanExpressions.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.types.BooleanType + Review comment: This test class referencing the `nullExpressions.scala`, but moving this into `predicates.scala` is OK too. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #24793: [SPARK-27944][ML] Unify the behavior of checking empty output column names
zhengruifeng commented on issue #24793: [SPARK-27944][ML] Unify the behavior of checking empty output column names URL: https://github.com/apache/spark/pull/24793#issuecomment-510356140 ping @srowen , would you mind help reviewing this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
SparkQA commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510355699 **[Test build #107515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107515/testReport)** for PR 24860 at commit [`615f592`](https://github.com/apache/spark/commit/615f59273f2ddd1d167627f3e6c62249adca684d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
SparkQA commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-510355679 **[Test build #107514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107514/testReport)** for PR 25001 at commit [`33c7ad4`](https://github.com/apache/spark/commit/33c7ad44905272c7b6dc2433160c7329689758ee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510355155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12641/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
AmplabJenkins removed a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510355151 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
AmplabJenkins commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510355151 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-510355071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12640/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
AmplabJenkins commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510355155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12641/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-510355063 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins removed a comment on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-510355063 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
AmplabJenkins commented on issue #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#issuecomment-510355071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12640/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#discussion_r302387733 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except.sql ## @@ -0,0 +1,58 @@ +-- This test file was converted from except.sql. +-- Tests different scenarios of except operation +create temporary view t1 as select * from values + ("one", 1), + ("two", 2), + ("three", 3), + ("one", NULL) + as t1(k, v); + +create temporary view t2 as select * from values + ("one", 1), + ("two", 22), + ("one", 5), + ("one", NULL), + (NULL, 5) + as t2(k, v); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t1 EXCEPT SELECT * FROM t2; + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 EXCEPT SELECT * FROM t1 where udf(v) <> 1 and v <> udf(2); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 where udf(v) <> 1 and v <> udf(22) EXCEPT SELECT * FROM t1 where udf(v) <> 2 and v >= udf(3); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT t1.* FROM t1, t2 where t1.k = t2.k +EXCEPT +SELECT t1.* FROM t1, t2 where t1.k = t2.k and t1.k != udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t2 where v >= udf(1) and udf(v) <> 22 EXCEPT SELECT * FROM t1; + + +-- Except operation that will be replaced by left anti join +SELECT (SELECT min(udf(k)) FROM t2 WHERE t2.k = t1.k) min_t2 FROM t1 +MINUS +SELECT (SELECT udf(min(k)) FROM t2) abs_min_t2 FROM t1 WHERE t1.k = udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT t1.k +FROM t1 +WHERE t1.v <= (SELECT max(udf(t2.v)) Review comment: Let's do that, and comment the test with a new JIRA as guided in the parent JIRA (I think I wrote some words in the guide for this case as well). Actually, finding such cases and fixing it is one of the key points of doing this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510354304 cc @hvanhovell @maryannxue @viirya @gatorsmile @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#discussion_r302386994 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala ## @@ -52,4 +52,8 @@ case class SaveIntoDataSourceCommand( val redacted = SQLConf.get.redactOptions(options) s"SaveIntoDataSourceCommand ${dataSource}, ${redacted}, ${mode}" } + + override def clone(): LogicalPlan = { +SaveIntoDataSourceCommand(query.clone(), dataSource, options, mode) Review comment: The `mapChildren` in `TreeNode` will change the map type. (from `CaseInsensitiveMap` to a normal map) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510353989 Yea Let me fix that one while I'm here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510352855 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#discussion_r302386743 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala ## @@ -168,4 +168,6 @@ case object ResetCommand extends RunnableCommand with IgnoreCachedData { sparkSession.sessionState.conf.clear() Seq.empty[Row] } + + override def clone(): LogicalPlan = this Review comment: The `clone` defined in `TreeNode` doesn't work for case object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins removed a comment on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510352863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12639/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#discussion_r302386541 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala ## @@ -166,18 +167,27 @@ object InMemoryRelation { outputOrdering: Seq[SortOrder], statsOfPlanToCache: Statistics): InMemoryRelation = { val relation = InMemoryRelation(output, cacheBuilder, outputOrdering) -relation.statsOfPlanToCache = statsOfPlanToCache +relation.setStatsOfPlanToCache(statsOfPlanToCache) relation } + + val STATS_OF_PLAN_TO_CACHE_TAG = new TreeNodeTag[Statistics]("stats_of_plan_to_cache") } case class InMemoryRelation( output: Seq[Attribute], @transient cacheBuilder: CachedRDDBuilder, override val outputOrdering: Seq[SortOrder]) extends logical.LeafNode with MultiInstanceRelation { + import InMemoryRelation.STATS_OF_PLAN_TO_CACHE_TAG + + def setStatsOfPlanToCache(statsOfPlanToCache: Statistics): Unit = { +setTagValue(STATS_OF_PLAN_TO_CACHE_TAG, statsOfPlanToCache) + } - @volatile var statsOfPlanToCache: Statistics = null Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan commented on a change in pull request #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#discussion_r302386462 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/LogicalPlanStats.scala ## @@ -18,33 +18,38 @@ package org.apache.spark.sql.catalyst.plans.logical.statsEstimation import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.trees.TreeNodeTag /** * A trait to add statistics propagation to [[LogicalPlan]]. */ trait LogicalPlanStats { self: LogicalPlan => + import LogicalPlanStats.STATS_CACHE_TAG /** * Returns the estimated statistics for the current logical plan node. Under the hood, this * method caches the return value, which is computed based on the configuration passed in the * first time. If the configuration changes, the cache can be invalidated by calling * [[invalidateStatsCache()]]. */ - def stats: Statistics = statsCache.getOrElse { + def stats: Statistics = statsOpt.getOrElse { if (conf.cboEnabled) { - statsCache = Option(BasicStatsPlanVisitor.visit(self)) + setTagValue(STATS_CACHE_TAG, BasicStatsPlanVisitor.visit(self)) } else { - statsCache = Option(SizeInBytesOnlyStatsPlanVisitor.visit(self)) + setTagValue(STATS_CACHE_TAG, SizeInBytesOnlyStatsPlanVisitor.visit(self)) } -statsCache.get +statsOpt.get } - /** A cache for the estimated statistics, such that it will only be computed once. */ - protected var statsCache: Option[Statistics] = None Review comment: it's fragile to use member variable to keep stats, as they will be lost after copy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302386540 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return then plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +val traversedChild = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +lazy val substitutedCTEPlan = traverseAndSubstituteCTE(ctePlan, true) +substituteCTE(currentPlan, cteName, substitutedCTEPlan) +} + + case other if inTraverse => Review comment: aha, ok. could you leave a comment here, too? This is an automated message from the Apache Git Service. To respond to the mes
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302386285 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -51,71 +51,123 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { // null handling checkLiteralRow(Literal.create(null, StringType).like(_), "a", null) +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) Review comment: OK. I add a new test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner
SparkQA commented on issue #25111: [SPARK-28346][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510353507 **[Test build #107513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107513/testReport)** for PR 25111 at commit [`656ae55`](https://github.com/apache/spark/commit/656ae55bc1ce362e7e4470336fc3ea3d9127bf88). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302385954 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return then plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +val traversedChild = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +lazy val substitutedCTEPlan = traverseAndSubstituteCTE(ctePlan, true) Review comment: Which test does fail for removing the call-by-name parameter? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries abou
[GitHub] [spark] maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
maropu commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302385954 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return then plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +val traversedChild = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +lazy val substitutedCTEPlan = traverseAndSubstituteCTE(ctePlan, true) Review comment: Which test does fail if we remove the call-by-name parameter? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries abou
[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins commented on issue #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510352863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12639/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner
AmplabJenkins commented on issue #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111#issuecomment-510352855 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
dongjoon-hyun commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510352914 Nit. Can we use Decimal instead? In case of 'corr', int casting seems to return 0 always. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan opened a new pull request #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner
cloud-fan opened a new pull request #25111: [SPARK-xxx][SQL] clone the query plan between analyzer, optimizer and planner URL: https://github.com/apache/spark/pull/25111 ## What changes were proposed in this pull request? query plan was designed to be immutable, but sometimes we do allow it to carry mutable states, because of the complexity of the SQL system. One example is `TreeNodeTag`. It's a state of `TreeNode` and can be carried over during copy and transform. The adaptive execution framework relies on it to link the logical and physical plans. This leads to a problem: when we get `QueryExecution#analyzed`, the plan can be changed unexpectedly because it's mutable. I hit a real issue in https://github.com/apache/spark/pull/25107 : I use `TreeNodeTag` to carry dataset id in logical plans. However, the analyzed plan ends up with many duplicated dataset id tags in different nodes. It turns out that, the optimizer transforms the logical plan and add the tag to more nodes. For example, the logical plan is `SubqueryAlias(Filter(...))`, and I expect only the `SubqueryAlais` has the dataset id tag. However, the optimizer removes `SubqueryAlias` carries over the dataset id tag to `Filter`. When I go back to the analyzed plan, both `SubqueryAlias` and `Filter` has the dataset id tag, which breaks my assumption. Since now query plan is mutable, I think it's better to limit the life cycle of a query plan instance. We can clone the query plan between analyzer, optimizer and planner, so that the life cycle is limited in one stage. ## How was this patch tested? new test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#discussion_r302384736 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except.sql ## @@ -0,0 +1,58 @@ +-- This test file was converted from except.sql. +-- Tests different scenarios of except operation +create temporary view t1 as select * from values + ("one", 1), + ("two", 2), + ("three", 3), + ("one", NULL) + as t1(k, v); + +create temporary view t2 as select * from values + ("one", 1), + ("two", 22), + ("one", 5), + ("one", NULL), + (NULL, 5) + as t2(k, v); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t1 EXCEPT SELECT * FROM t2; + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 EXCEPT SELECT * FROM t1 where udf(v) <> 1 and v <> udf(2); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 where udf(v) <> 1 and v <> udf(22) EXCEPT SELECT * FROM t1 where udf(v) <> 2 and v >= udf(3); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT t1.* FROM t1, t2 where t1.k = t2.k +EXCEPT +SELECT t1.* FROM t1, t2 where t1.k = t2.k and t1.k != udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t2 where v >= udf(1) and udf(v) <> 22 EXCEPT SELECT * FROM t1; + + +-- Except operation that will be replaced by left anti join +SELECT (SELECT min(udf(k)) FROM t2 WHERE t2.k = t1.k) min_t2 FROM t1 +MINUS +SELECT (SELECT udf(min(k)) FROM t2) abs_min_t2 FROM t1 WHERE t1.k = udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT t1.k +FROM t1 +WHERE t1.v <= (SELECT max(udf(t2.v)) +FROM t2 +WHEREt2.k = t1.k) Review comment: will change to ```(udf)t2.k = (udf)t1.k``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#discussion_r302384711 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except.sql ## @@ -0,0 +1,58 @@ +-- This test file was converted from except.sql. +-- Tests different scenarios of except operation +create temporary view t1 as select * from values + ("one", 1), + ("two", 2), + ("three", 3), + ("one", NULL) + as t1(k, v); + +create temporary view t2 as select * from values + ("one", 1), + ("two", 22), + ("one", 5), + ("one", NULL), + (NULL, 5) + as t2(k, v); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t1 EXCEPT SELECT * FROM t2; + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 EXCEPT SELECT * FROM t1 where udf(v) <> 1 and v <> udf(2); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT * FROM t1 where udf(v) <> 1 and v <> udf(22) EXCEPT SELECT * FROM t1 where udf(v) <> 2 and v >= udf(3); + + +-- Except operation that will be replaced by Filter: SPARK-22181 +SELECT t1.* FROM t1, t2 where t1.k = t2.k +EXCEPT +SELECT t1.* FROM t1, t2 where t1.k = t2.k and t1.k != udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t2 where v >= udf(1) and udf(v) <> 22 EXCEPT SELECT * FROM t1; + + +-- Except operation that will be replaced by left anti join +SELECT (SELECT min(udf(k)) FROM t2 WHERE t2.k = t1.k) min_t2 FROM t1 +MINUS +SELECT (SELECT udf(min(k)) FROM t2) abs_min_t2 FROM t1 WHERE t1.k = udf('one'); + + +-- Except operation that will be replaced by left anti join +SELECT t1.k +FROM t1 +WHERE t1.v <= (SELECT max(udf(t2.v)) Review comment: I tried ```udf(max(udf(t2.v)))``` query 8 schema and output changed to ``` -- !query 8 schema struct<> -- !query 8 output java.lang.UnsupportedOperationException Cannot evaluate expression: udf(null) ``` The expected results are ``` -- !query 8 schema struct -- !query 8 output two ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base
huaxingao commented on a change in pull request #25101: [SPARK-28277][SQL][PYTHON][TESTS] Convert and port 'except.sql' into UDF test base URL: https://github.com/apache/spark/pull/25101#discussion_r302384662 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except.sql ## @@ -0,0 +1,58 @@ +-- This test file was converted from except.sql. +-- Tests different scenarios of except operation +create temporary view t1 as select * from values + ("one", 1), + ("two", 2), + ("three", 3), + ("one", NULL) + as t1(k, v); + +create temporary view t2 as select * from values + ("one", 1), + ("two", 22), + ("one", 5), + ("one", NULL), + (NULL, 5) + as t2(k, v); + + +-- Except operation that will be replaced by left anti join +SELECT * FROM t1 EXCEPT SELECT * FROM t2; Review comment: Sure. Will change to ``` SELECT udf(k), udf(v) FROM t1 EXCEPT SELECT udf(k), udf(v) FROM t2; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510352059 I double checked that it works with JDK 11 just for doubly sure @dongjoon-hyun: ``` Using /.../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from /.../spark/project [info] Updating {file:/.../spark/project/}spark-build... ... [info] SQLQueryTestSuite: ... [info] - udf/pgSQL/udf-aggregates_part1.sql - Scala UDF (17 seconds, 228 milliseconds) [info] - udf/pgSQL/udf-aggregates_part1.sql - Regular Python UDF (36 seconds, 170 milliseconds) [info] - udf/pgSQL/udf-aggregates_part1.sql - Scalar Pandas UDF (41 seconds, 132 milliseconds) ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
SparkQA commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510351413 **[Test build #107512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107512/testReport)** for PR 25029 at commit [`45f0642`](https://github.com/apache/spark/commit/45f064230aa820627c1ed42662d50ab2b672adb1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
maropu edited a comment on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510350433 ~Looks nice, will do now~ oh, @wangyum seems to be working on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510350780 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins removed a comment on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510350787 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12638/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510350780 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
AmplabJenkins commented on issue #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#issuecomment-510350787 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12638/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql
maropu commented on issue #24860: [SPARK-28034][SQL][TEST] Port with.sql URL: https://github.com/apache/spark/pull/24860#issuecomment-510350433 Looks nice, will do now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses
peter-toth commented on a change in pull request #25029: [SPARK-28228][SQL] Fix substitution order of nested WITH clauses URL: https://github.com/apache/spark/pull/25029#discussion_r302382497 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.SubqueryExpression +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.internal.SQLConf + +/** + * Analyze WITH nodes and substitute child plan with CTE definitions. + */ +object CTESubstitution extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = if (SQLConf.get.legacyCTESubstitutionEnabled) { +legacyTraverseAndSubstituteCTE(plan) + } else { +traverseAndSubstituteCTE(plan, false) + } + + private def legacyTraverseAndSubstituteCTE(plan: LogicalPlan): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child, relations) => +// substitute CTE expressions right-to-left to resolve references to previous CTEs: +// with a as (select * from t), b as (select * from a) select * from b +relations.foldRight(child) { + case ((cteName, ctePlan), currentPlan) => substituteCTE(currentPlan, cteName, ctePlan) +} +} + } + + /** + * Traverse the plan and expression nodes as a tree and replace matching references to CTE + * definitions. + * - If the rule encounters a WITH node then it substitutes the child of the node with CTE + * definitions of the node right-to-left order as a definition can reference to a previous + * one. + * For example the following query is valid: + * WITH + * t AS (SELECT 1), + * t2 AS (SELECT * FROM t) + * SELECT * FROM t2 + * - If a CTE definition contains an inner WITH node then substitution of inner should take + * precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH + * t AS (SELECT 1), + * t2 AS ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * SELECT * FROM t2 + * - If a CTE definition contains a subquery that contains an inner WITH node then substitution + * of inner should take precedence because it can shadow an outer CTE definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1 AS c) + * SELECT max(c) FROM ( + * WITH t AS (SELECT 2 AS c) + * SELECT * FROM t + * ) + * - If a CTE definition contains a subquery expression that contains an inner WITH node then + * substitution of inner should take precedence because it can shadow an outer CTE + * definition. + * For example the following query should return 2: + * WITH t AS (SELECT 1) + * SELECT ( + * WITH t AS (SELECT 2) + * SELECT * FROM t + * ) + * @param plan the plan to be traversed + * @param inTraverse whether the current traverse is called from another traverse, only in this + * case name collision can occur + * @return then plan where CTE substitution is applied + */ + private def traverseAndSubstituteCTE(plan: LogicalPlan, inTraverse: Boolean): LogicalPlan = { +plan.resolveOperatorsUp { + case With(child: LogicalPlan, relations) => +val traversedChild = child transformExpressions { + case e: SubqueryExpression => e.withNewPlan(traverseAndSubstituteCTE(e.plan, true)) +} + +relations.foldRight(traversedChild) { + case ((cteName, ctePlan), currentPlan) => +lazy val substitutedCTEPlan = traverseAndSubstituteCTE(ctePlan, true) Review comment: Unfortunately, if I remove call by name parameter passing from `substituteCTE` then `lazy val substitutedCTEPlan` will be evaluated before calling `substituteCTE` with it. I think both call by name passing of `ctePlan` and `lazy val substitutedCTEPlan` is required to make sure that: - if `cteNa
[GitHub] [spark] AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510348570 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
SparkQA commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510349217 **[Test build #107511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107511/testReport)** for PR 25110 at commit [`a0a6219`](https://github.com/apache/spark/commit/a0a6219885293bf0993c41f93329bc504426340f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510348570 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510348576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12637/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25096: [SPARK-28334][SQL][TEST] Port select.sql
maropu commented on a change in pull request #25096: [SPARK-28334][SQL][TEST] Port select.sql URL: https://github.com/apache/spark/pull/25096#discussion_r302379700 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/select.sql ## @@ -0,0 +1,282 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- SELECT +-- Test int8 64-bit integers. +-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select.sql +-- +create or replace temporary view onek2 as select * from onek; +create or replace temporary view INT8_TBL as select * from values + (cast(trim(' 123 ') as bigint), cast(trim(' 456') as bigint)), + (cast(trim('123 ') as bigint),cast('4567890123456789' as bigint)), + (cast('4567890123456789' as bigint),cast('123' as bigint)), + (cast(+4567890123456789 as bigint),cast('4567890123456789' as bigint)), + (cast('+4567890123456789' as bigint),cast('-4567890123456789' as bigint)) + as INT8_TBL(q1, q2); + +-- btree index +-- awk '{if($1<10){print;}else{next;}}' onek.data | sort +0n -1 +-- +SELECT * FROM onek + WHERE onek.unique1 < 10 + ORDER BY onek.unique1; + +-- [SPARK-28010] Support ORDER BY ... USING syntax +-- +-- awk '{if($1<20){print $1,$14;}else{next;}}' onek.data | sort +0nr -1 +-- +SELECT onek.unique1, onek.stringu1 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$14;}else{next;}}' onek.data | sort +1d -2 +-- +SELECT onek.unique1, onek.stringu1 FROM onek + WHERE onek.unique1 > 980 + ORDER BY stringu1 ASC; + +-- +-- awk '{if($1>980){print $1,$16;}else{next;}}' onek.data | +-- sort +1d -2 +0nr -1 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 > 980 + ORDER BY string4 ASC, unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$16;}else{next;}}' onek.data | +-- sort +1dr -2 +0n -1 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 > 980 + ORDER BY string4 DESC, unique1 ASC; + +-- +-- awk '{if($1<20){print $1,$16;}else{next;}}' onek.data | +-- sort +0nr -1 +1d -2 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 DESC, string4 ASC; + +-- +-- awk '{if($1<20){print $1,$16;}else{next;}}' onek.data | +-- sort +0n -1 +1dr -2 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 ASC, string4 DESC; + +-- +-- test partial btree indexes +-- +-- As of 7.2, planner probably won't pick an indexscan without stats, +-- so ANALYZE first. Also, we want to prevent it from picking a bitmapscan +-- followed by sort, because that could hide index ordering problems. +-- +-- ANALYZE onek2; + +-- SET enable_seqscan TO off; +-- SET enable_bitmapscan TO off; +-- SET enable_sort TO off; + +-- +-- awk '{if($1<10){print $0;}else{next;}}' onek.data | sort +0n -1 +-- +SELECT onek2.* FROM onek2 WHERE onek2.unique1 < 10; + +-- +-- awk '{if($1<20){print $1,$14;}else{next;}}' onek.data | sort +0nr -1 +-- +SELECT onek2.unique1, onek2.stringu1 FROM onek2 +WHERE onek2.unique1 < 20 +ORDER BY unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$14;}else{next;}}' onek.data | sort +1d -2 +-- +SELECT onek2.unique1, onek2.stringu1 FROM onek2 + WHERE onek2.unique1 > 980; + +-- RESET enable_seqscan; +-- RESET enable_bitmapscan; +-- RESET enable_sort; + +-- [SPARK-28329] SELECT INTO syntax +CREATE TABLE tmp USING parquet AS Review comment: How about commenting out unsupported syntaxes instead of removing? I feel its better to keep the original ones as much as possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #25096: [SPARK-28334][SQL][TEST] Port select.sql
maropu edited a comment on issue #25096: [SPARK-28334][SQL][TEST] Port select.sql URL: https://github.com/apache/spark/pull/25096#issuecomment-510344390 I left one comment though, it looks ok to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510346586 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
SparkQA commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510347086 **[Test build #107510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107510/testReport)** for PR 25110 at commit [`e7143de`](https://github.com/apache/spark/commit/e7143defbb89d10d4b877969fc7497d896cde161). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins removed a comment on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510346597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12636/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25096: [SPARK-28334][SQL][TEST] Port select.sql
maropu commented on a change in pull request #25096: [SPARK-28334][SQL][TEST] Port select.sql URL: https://github.com/apache/spark/pull/25096#discussion_r302379700 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/select.sql ## @@ -0,0 +1,282 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- SELECT +-- Test int8 64-bit integers. +-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select.sql +-- +create or replace temporary view onek2 as select * from onek; +create or replace temporary view INT8_TBL as select * from values + (cast(trim(' 123 ') as bigint), cast(trim(' 456') as bigint)), + (cast(trim('123 ') as bigint),cast('4567890123456789' as bigint)), + (cast('4567890123456789' as bigint),cast('123' as bigint)), + (cast(+4567890123456789 as bigint),cast('4567890123456789' as bigint)), + (cast('+4567890123456789' as bigint),cast('-4567890123456789' as bigint)) + as INT8_TBL(q1, q2); + +-- btree index +-- awk '{if($1<10){print;}else{next;}}' onek.data | sort +0n -1 +-- +SELECT * FROM onek + WHERE onek.unique1 < 10 + ORDER BY onek.unique1; + +-- [SPARK-28010] Support ORDER BY ... USING syntax +-- +-- awk '{if($1<20){print $1,$14;}else{next;}}' onek.data | sort +0nr -1 +-- +SELECT onek.unique1, onek.stringu1 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$14;}else{next;}}' onek.data | sort +1d -2 +-- +SELECT onek.unique1, onek.stringu1 FROM onek + WHERE onek.unique1 > 980 + ORDER BY stringu1 ASC; + +-- +-- awk '{if($1>980){print $1,$16;}else{next;}}' onek.data | +-- sort +1d -2 +0nr -1 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 > 980 + ORDER BY string4 ASC, unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$16;}else{next;}}' onek.data | +-- sort +1dr -2 +0n -1 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 > 980 + ORDER BY string4 DESC, unique1 ASC; + +-- +-- awk '{if($1<20){print $1,$16;}else{next;}}' onek.data | +-- sort +0nr -1 +1d -2 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 DESC, string4 ASC; + +-- +-- awk '{if($1<20){print $1,$16;}else{next;}}' onek.data | +-- sort +0n -1 +1dr -2 +-- +SELECT onek.unique1, onek.string4 FROM onek + WHERE onek.unique1 < 20 + ORDER BY unique1 ASC, string4 DESC; + +-- +-- test partial btree indexes +-- +-- As of 7.2, planner probably won't pick an indexscan without stats, +-- so ANALYZE first. Also, we want to prevent it from picking a bitmapscan +-- followed by sort, because that could hide index ordering problems. +-- +-- ANALYZE onek2; + +-- SET enable_seqscan TO off; +-- SET enable_bitmapscan TO off; +-- SET enable_sort TO off; + +-- +-- awk '{if($1<10){print $0;}else{next;}}' onek.data | sort +0n -1 +-- +SELECT onek2.* FROM onek2 WHERE onek2.unique1 < 10; + +-- +-- awk '{if($1<20){print $1,$14;}else{next;}}' onek.data | sort +0nr -1 +-- +SELECT onek2.unique1, onek2.stringu1 FROM onek2 +WHERE onek2.unique1 < 20 +ORDER BY unique1 DESC; + +-- +-- awk '{if($1>980){print $1,$14;}else{next;}}' onek.data | sort +1d -2 +-- +SELECT onek2.unique1, onek2.stringu1 FROM onek2 + WHERE onek2.unique1 > 980; + +-- RESET enable_seqscan; +-- RESET enable_bitmapscan; +-- RESET enable_sort; + +-- [SPARK-28329] SELECT INTO syntax +CREATE TABLE tmp USING parquet AS Review comment: How about commenting out unsupported syntaxes instead of removing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510346586 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
AmplabJenkins commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510346597 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12636/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510345861 FYI @viirya, @skonto, @imback82, @huaxingao, @vinodkc, @manuzhang since you guys are working on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
HyukjinKwon commented on issue #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110#issuecomment-510345469 cc @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation
HyukjinKwon opened a new pull request #25110: [SPARK-28270][SQL][FOLLOW-UP] Explicitly cast into integer/long in udf-aggregates_part1.sql to avoid Python float limitation URL: https://github.com/apache/spark/pull/25110 ## What changes were proposed in this pull request? The tests added at https://github.com/apache/spark/pull/25069 seem flaky in some environments. See https://github.com/apache/spark/pull/25069#issuecomment-510338469 Python's string representation of floats can make the tests flaky. See https://docs.python.org/3/tutorial/floatingpoint.html. I think it's just better to explicitly cast everywhere udf returns a float (or a double) to stay safe. This PR proposes to cast it to integer to long explicitly to make the test cases robust. ## How was this patch tested? Manually tested in local. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25096: [SPARK-28334][SQL][TEST] Port select.sql
maropu commented on issue #25096: [SPARK-28334][SQL][TEST] Port select.sql URL: https://github.com/apache/spark/pull/25096#issuecomment-510344390 sure, I'll check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
beliefer commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302342786 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -142,10 +145,11 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi } else { val pattern = ctx.freshName("pattern") val rightStr = ctx.freshName("rightStr") + val escapeChar = escapeCharOpt.getOrElse("") Review comment: I change `""` to `""""""`,it's OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
AmplabJenkins removed a comment on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510340877 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
SparkQA commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510341474 **[Test build #107509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107509/testReport)** for PR 25109 at commit [`86ca933`](https://github.com/apache/spark/commit/86ca93380d93565c655eb7a959645536f2c5c64f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
AmplabJenkins removed a comment on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510340883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12635/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510341195 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510341202 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107501/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510341195 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510341202 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107501/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
wangyum commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510341125 @maropu I didn't enable `spark.sql.parser.ansi.enabled` because: ```sql spark-sql> set spark.sql.parser.ansi.enabled=true; spark.sql.parser.ansi.enabledtrue spark-sql> select 1 as false; Error in query: no viable alternative at input 'false'(line 1, pos 12) == SQL == select 1 as false ^^^ spark-sql> select 1 as minus; Error in query: no viable alternative at input 'minus'(line 1, pos 12) == SQL == select 1 as minus ^^^ ``` Is this we expected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax
maropu commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r302374072 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/booleanExpressions.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.types.BooleanType + +/** + * String to indicate which boolean test selected. + */ +object BooleanTest { + val TRUE = "TRUE" + val FALSE = "FALSE" + val UNKNOWN = "UNKNOWN" + + def calculate(input: Any, booleanValue: String): Boolean = { +booleanValue match { + case TRUE => input == true + case FALSE => input == false + case UNKNOWN => input == null + case _ => throw new AnalysisException("Boolean test value must be one of TRUE, " + +"FALSE and UNKNOWN.") +} + } +} + +/** + * Test the value of an expression is true, false, or unknown. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, booleanValue) - Returns true if `expr` equals booleanValue, " + +"or false otherwise.", + arguments = """ +Arguments: + * expr - a boolean expression + * booleanValue - a boolean value represented by a string. booleanValue must be one + of TRUE, FALSE and UNKNOWN. + """, + examples = """ +Examples: +> SELECT _FUNC_(1 > 2, true); + false +> SELECT _FUNC_(2 > 1, true); + true + """) +case class BooleanTest(child: Expression, booleanValue: String) Review comment: As another design choice, how about handling`IS UNKNOWN` just as `IS NULL`, then changing `booleanValue: String` into `booleanValue: Boolean`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510303053 **[Test build #107501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107501/testReport)** for PR 25090 at commit [`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
AmplabJenkins commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510340877 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
AmplabJenkins commented on issue #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109#issuecomment-510340883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12635/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-510340755 **[Test build #107501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107501/testReport)** for PR 25090 at commit [`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510340656 Thanks a lot! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product
wangyum opened a new pull request #25109: [SPARK-28343][SQL][TEST] PostgreSQL test should enable cartesian product URL: https://github.com/apache/spark/pull/25109 ## What changes were proposed in this pull request? This pr enables cartesian product for PostgreSQL test. ## How was this patch tested? manual tests: Run `test.sql` in [pgSQL](https://github.com/apache/spark/tree/master/sql/core/src/test/resources/sql-tests/inputs/pgSQL) directory and in [inputs](https://github.com/apache/spark/tree/master/sql/core/src/test/resources/sql-tests/inputs) directory: ```sql cat < test.sql create or replace temporary view t1 as select * from (values(1), (2)) as v (val); create or replace temporary view t2 as select * from (values(2), (1)) as v (val); select t1.*, t2.* from t1 join t2; EOF ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
SparkQA commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510339651 **[Test build #107508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107508/testReport)** for PR 24637 at commit [`a453d8c`](https://github.com/apache/spark/commit/a453d8c84644b1edf38ff90821b0dfff85f50c2c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
AmplabJenkins removed a comment on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108#issuecomment-510339208 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
AmplabJenkins removed a comment on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510339171 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
SparkQA commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108#issuecomment-510339633 **[Test build #107507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107507/testReport)** for PR 25108 at commit [`c725002`](https://github.com/apache/spark/commit/c7250023de9613276954989c23168920616ad0f1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
HyukjinKwon commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510339534 It actually passed in my local as well.. but seems dependent on specific Python version or OS specific ... ``` Expected "32.6[64]", but got "32.6[7]" Result did not match for query #1 SELECT udf(avg(a)) AS avg_32 FROM aggtest WHERE a < 100 ``` I think it's https://docs.python.org/3/tutorial/floatingpoint.html .. Let me just cast it into int as I worked around in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
AmplabJenkins removed a comment on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510339180 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12634/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510339155 Ya. Please check locally first because the above report is on JDK11~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
AmplabJenkins commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510339171 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
AmplabJenkins commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108#issuecomment-510339214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12633/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun edited a comment on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510339155 Thanks. Ya. Please check locally first because the above report is on JDK11~ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
AmplabJenkins commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108#issuecomment-510339208 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
AmplabJenkins commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510339180 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12634/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
HyukjinKwon commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510339061 Will take a look and revert it's going to take longer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
HyukjinKwon opened a new pull request #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069 ## What changes were proposed in this pull request? This PR adds some tests converted from `pgSQL/aggregates_part1.sql'` to test UDFs. Please see contribution guide of this umbrella ticket - [SPARK-27921](https://issues.apache.org/jira/browse/SPARK-27921). This PR also contains two minor fixes: 1. Change name of Scala UDF from `UDF:name(...)` to `name(...)` to be consistent with Python' 2. Fix Scala UDF at `IntegratedUDFTestUtils.scala ` to handle `null` in strings. Diff comparing to 'pgSQL/aggregates_part1.sql' ```diff diff --git a/sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part1.sql.out b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-aggregates_part1.sql.out index 51ca1d55869..124fdd6416e 100644 --- a/sql/core/src/test/resources/sql-tests/results/pgSQL/aggregates_part1.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/udf/pgSQL/udf-aggregates_part1.sql.out @@ -3,7 +3,7 @@ -- !query 0 -SELECT avg(four) AS avg_1 FROM onek +SELECT avg(udf(four)) AS avg_1 FROM onek -- !query 0 schema struct -- !query 0 output @@ -11,15 +11,15 @@ struct -- !query 1 -SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100 +SELECT udf(avg(a)) AS avg_32 FROM aggtest WHERE a < 100 -- !query 1 schema -struct +struct -- !query 1 output 32.664 -- !query 2 -select CAST(avg(b) AS Decimal(10,3)) AS avg_107_943 FROM aggtest +select CAST(avg(udf(b)) AS Decimal(10,3)) AS avg_107_943 FROM aggtest -- !query 2 schema struct -- !query 2 output @@ -27,285 +27,286 @@ struct -- !query 3 -SELECT sum(four) AS sum_1500 FROM onek +SELECT sum(udf(four)) AS sum_1500 FROM onek -- !query 3 schema -struct +struct -- !query 3 output -1500 +1500.0 -- !query 4 -SELECT sum(a) AS sum_198 FROM aggtest +SELECT udf(sum(a)) AS sum_198 FROM aggtest -- !query 4 schema -struct +struct -- !query 4 output 198 -- !query 5 -SELECT sum(b) AS avg_431_773 FROM aggtest +SELECT udf(udf(sum(b))) AS avg_431_773 FROM aggtest -- !query 5 schema -struct +struct -- !query 5 output 431.77260909229517 -- !query 6 -SELECT max(four) AS max_3 FROM onek +SELECT udf(max(four)) AS max_3 FROM onek -- !query 6 schema -struct +struct -- !query 6 output 3 -- !query 7 -SELECT max(a) AS max_100 FROM aggtest +SELECT max(udf(a)) AS max_100 FROM aggtest -- !query 7 schema -struct +struct -- !query 7 output -100 +56 -- !query 8 -SELECT max(aggtest.b) AS max_324_78 FROM aggtest +SELECT CAST(udf(udf(max(aggtest.b))) AS int) AS max_324_78 FROM aggtest -- !query 8 schema -struct +struct -- !query 8 output -324.78 +324 -- !query 9 -SELECT stddev_pop(b) FROM aggtest +SELECT CAST(stddev_pop(udf(b)) AS int) FROM aggtest -- !query 9 schema -struct +struct -- !query 9 output -131.10703231895047 +131 -- !query 10 -SELECT stddev_samp(b) FROM aggtest +SELECT udf(stddev_samp(b)) FROM aggtest -- !query 10 schema -struct +struct -- !query 10 output 151.38936080399804 -- !query 11 -SELECT var_pop(b) FROM aggtest +SELECT CAST(var_pop(udf(b)) as int) FROM aggtest -- !query 11 schema -struct +struct -- !query 11 output -17189.053923482323 +17189 -- !query 12 -SELECT var_samp(b) FROM aggtest +SELECT udf(var_samp(b)) FROM aggtest -- !query 12 schema -struct +struct -- !query 12 output 22918.738564643096 -- !query 13 -SELECT stddev_pop(CAST(b AS Decimal(38,0))) FROM aggtest +SELECT udf(stddev_pop(CAST(b AS Decimal(38,0 FROM aggtest -- !query 13 schema -struct +struct -- !query 13 output 131.18117242958306 -- !query 14 -SELECT stddev_samp(CAST(b AS Decimal(38,0))) FROM aggtest +SELECT stddev_samp(CAST(udf(b) AS Decimal(38,0))) FROM aggtest -- !query 14 schema -struct +struct -- !query 14 output 151.47497042966097 -- !query 15 -SELECT var_pop(CAST(b AS Decimal(38,0))) FROM aggtest +SELECT udf(var_pop(CAST(b AS Decimal(38,0 FROM aggtest -- !query 15 schema -struct +struct -- !query 15 output 17208.5 -- !query 16 -SELECT var_samp(CAST(b AS Decimal(38,0))) FROM aggtest +SELECT var_samp(udf(CAST(b AS Decimal(38,0 FROM aggtest -- !query 16 schema -struct +struct -- !query 16 output 22944.6668 -- !query 17 -SELECT var_pop(1.0), var_samp(2.0) +SELECT udf(var_pop(1.0)), var_sam
[GitHub] [spark] HyukjinKwon closed pull request #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
HyukjinKwon closed pull request #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
HyukjinKwon commented on issue #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108#issuecomment-510338860 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun edited a comment on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510338586 Could you investigate this? For the others, this commit is still on testings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510338586 Could you investigate this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once
HyukjinKwon opened a new pull request #25108: [SPARK-28321][SQL] 0-args Java UDF should not be called only once URL: https://github.com/apache/spark/pull/25108 ## What changes were proposed in this pull request? 0-args Java UDF alone calls the function even before making it as an expression. It causes that the function always returns the same value. Seems like a mistake. ## How was this patch tested? Unit test was added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
dongjoon-hyun commented on issue #25069: [SPARK-28270][SQL][PYTHON] Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base URL: https://github.com/apache/spark/pull/25069#issuecomment-510338469 @HyukjinKwon . This `udf/pgSQL/udf-aggregates_part1.sql - Regular Python UDF` failed three times consecutively. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing/1122/testReport/org.apache.spark.sql/SQLQueryTestSuite/udf_pgSQL_udf_aggregates_part1_sql___Regular_Python_UDF/history/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax
maropu commented on a change in pull request #25074: [SPARK-27924]Support ANSI SQL Boolean-Predicate syntax URL: https://github.com/apache/spark/pull/25074#discussion_r302371689 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/booleanExpressions.scala ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.types.BooleanType + Review comment: How about moving this into `predicates.scala`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25107: [SPARK-28344][SQL] detect ambiguous self-join and fail the query
SparkQA commented on issue #25107: [SPARK-28344][SQL] detect ambiguous self-join and fail the query URL: https://github.com/apache/spark/pull/25107#issuecomment-510336280 **[Test build #107505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107505/testReport)** for PR 25107 at commit [`1898674`](https://github.com/apache/spark/commit/189867428b8851da5242765641c2967608f1a833). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
dongjoon-hyun commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510336289 I made a PR to you which reverts all the other benchmark result to the master and adds the newly added test case result. - https://github.com/viirya/spark-1/pull/7 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode
SparkQA commented on issue #24637: [SPARK-27707][SQL] Prune unnecessary nested fields from Generate in explode URL: https://github.com/apache/spark/pull/24637#issuecomment-510336276 **[Test build #107506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107506/testReport)** for PR 24637 at commit [`456e6d2`](https://github.com/apache/spark/commit/456e6d2a8f67786105f0b4363e9e3213df8fd554). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org