[spark] branch master updated (4e8980e6ae9 -> 5b5083484cd)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 4e8980e6ae9 [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION` add 5b5083484cd [SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/json/JacksonParser.scala| 10 +- .../org/apache/spark/sql/internal/SQLConf.scala| 11 ++ sql/core/benchmarks/JsonBenchmark-results.txt | 155 ++--- .../org/apache/spark/sql/JsonFunctionsSuite.scala | 67 +++-- .../sql/execution/datasources/json/JsonSuite.scala | 25 +++- 5 files changed, 158 insertions(+), 110 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4e8980e6ae9 [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION` 4e8980e6ae9 is described below commit 4e8980e6ae9a513bb4c990944841a9db073013ea Author: yangjie01 AuthorDate: Wed Dec 14 08:22:33 2022 +0300 [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION` ### What changes were proposed in this pull request? This pr introduces sub-classes of `WRONG_NUM_ARGS`: - WITHOUT_SUGGESTION - WITH_SUGGESTION then replace existing `WRONG_NUM_ARGS` to `WRONG_NUM_ARGS.WITH_SUGGESTION` and rename error class `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION` ### Why are the changes needed? Proper names of error classes to improve user experience with Spark SQL. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add new test case Closes #38940 from LuciferYang/legacy-1043. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json| 21 ++--- .../spark/sql/errors/QueryCompilationErrors.scala | 8 .../resources/sql-tests/results/ansi/date.sql.out | 2 +- .../sql-tests/results/ansi/string-functions.sql.out | 4 ++-- .../results/ceil-floor-with-scale-param.sql.out | 4 ++-- .../sql-tests/results/csv-functions.sql.out | 2 +- .../test/resources/sql-tests/results/date.sql.out | 2 +- .../sql-tests/results/datetime-legacy.sql.out | 2 +- .../sql-tests/results/json-functions.sql.out| 8 .../results/sql-compatibility-functions.sql.out | 2 +- .../sql-tests/results/string-functions.sql.out | 4 ++-- .../results/table-valued-functions.sql.out | 2 +- .../sql-tests/results/timestamp-ntz.sql.out | 2 +- .../resources/sql-tests/results/udaf/udaf.sql.out | 2 +- .../sql-tests/results/udf/udf-udaf.sql.out | 2 +- .../apache/spark/sql/DataFrameFunctionsSuite.scala | 2 +- .../org/apache/spark/sql/DateFunctionsSuite.scala | 2 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 2 +- .../org/apache/spark/sql/StringFunctionsSuite.scala | 2 +- .../test/scala/org/apache/spark/sql/UDFSuite.scala | 11 ++- .../sql/errors/QueryCompilationErrorsSuite.scala| 13 + .../spark/sql/hive/execution/HiveUDAFSuite.scala| 2 +- 22 files changed, 57 insertions(+), 44 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index e1df3db4291..f66d6998e26 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1548,8 +1548,20 @@ }, "WRONG_NUM_ARGS" : { "message" : [ - "The requires parameters but the actual number is ." -] + "Invalid number of arguments for the function ." +], +"subClass" : { + "WITHOUT_SUGGESTION" : { +"message" : [ + "Please, refer to 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix." +] + }, + "WITH_SUGGESTION" : { +"message" : [ + "Consider to change the number of arguments because the function requires parameters but the actual number is ." +] + } +} }, "_LEGACY_ERROR_TEMP_0001" : { "message" : [ @@ -2018,11 +2030,6 @@ "Undefined function ." ] }, - "_LEGACY_ERROR_TEMP_1043" : { -"message" : [ - "Invalid arguments for function ." -] - }, "_LEGACY_ERROR_TEMP_1045" : { "message" : [ "ALTER TABLE SET LOCATION does not support partition for v2 tables." diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index b329f6689d4..a5ff2084ca8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -640,7 +640,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { def invalidFunctionArgumentsError( name: String, expectedNum: String, actualNum: Int): Throwable = { new AnalysisException( - errorClass = "WRONG_NUM_ARGS", + errorClass = "WRONG_NUM_ARGS.WITH_SUGGESTION", messageParameters = Map( "functionName" -> toSQLId(name), "expectedNum" -> expectedNum, @@ -649,10 +649,10 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase { def
[spark] branch master updated (ea53dc82f28 -> 1b3a4444b8c)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ea53dc82f28 [SPARK-41506][CONNECT][TESTS][FOLLOW-UP] Import BinaryType in pyspark.sql.tests.connect.test_connect_column add 1b3ab8c [SPARK-27561][SQL][FOLLOWUP] Move the two rules for Later column alias into one file No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 113 +-- ...rence.scala => ResolveLateralColumnAlias.scala} | 125 - .../sql/catalyst/rules/RuleIdCollection.scala | 2 +- 3 files changed, 122 insertions(+), 118 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/{ResolveLateralColumnAliasReference.scala => ResolveLateralColumnAlias.scala} (50%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41506][CONNECT][TESTS][FOLLOW-UP] Import BinaryType in pyspark.sql.tests.connect.test_connect_column
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ea53dc82f28 [SPARK-41506][CONNECT][TESTS][FOLLOW-UP] Import BinaryType in pyspark.sql.tests.connect.test_connect_column ea53dc82f28 is described below commit ea53dc82f28d2297deee4349f5f86e83f552e624 Author: Hyukjin Kwon AuthorDate: Wed Dec 14 10:36:52 2022 +0900 [SPARK-41506][CONNECT][TESTS][FOLLOW-UP] Import BinaryType in pyspark.sql.tests.connect.test_connect_column ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/39047 which import `BinaryType` that's removed in https://github.com/apache/spark/pull/39050. This was a logical conflict. ### Why are the changes needed? To recover the build. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually verified by running `./dev/lint-python`. Closes #39055 from HyukjinKwon/SPARK-41506-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- python/pyspark/sql/tests/connect/test_connect_column.py | 1 + 1 file changed, 1 insertion(+) diff --git a/python/pyspark/sql/tests/connect/test_connect_column.py b/python/pyspark/sql/tests/connect/test_connect_column.py index b7645bc4b71..c997f94a1ea 100644 --- a/python/pyspark/sql/tests/connect/test_connect_column.py +++ b/python/pyspark/sql/tests/connect/test_connect_column.py @@ -41,6 +41,7 @@ from pyspark.sql.types import ( TimestampType, TimestampNTZType, ByteType, +BinaryType, ShortType, IntegerType, FloatType, - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41506][CONNECT][PYTHON] Refactor LiteralExpression to support DataType
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cdc73ad36e5 [SPARK-41506][CONNECT][PYTHON] Refactor LiteralExpression to support DataType cdc73ad36e5 is described below commit cdc73ad36e53544add2cfb7ea66941202014303e Author: Ruifeng Zheng AuthorDate: Wed Dec 14 09:50:05 2022 +0900 [SPARK-41506][CONNECT][PYTHON] Refactor LiteralExpression to support DataType ### What changes were proposed in this pull request? 1, existing `LiteralExpression` is a mixture of `Literal`, `CreateArray`, `CreateStruct` and `CreateMap`, since we have added collection functions `array`, `struct` and `create_map`, the `CreateXXX` expressions can be replaced with `UnresolvedFunction`; 2, add field `dataType` in `LiteralExpression`, so we can specify the DataType if needed, a special case is the typed null; 3, it is up to the `lit` function to infer the DataType, not `LiteralExpression` itself; ### Why are the changes needed? Refactor LiteralExpression to support DataType ### Does this PR introduce _any_ user-facing change? No, `LiteralExpression` is a internal class, should not expose to end users ### How was this patch tested? added UT Closes #39047 from zhengruifeng/connect_lit_datatype. Authored-by: Ruifeng Zheng Signed-off-by: Hyukjin Kwon --- .../main/protobuf/spark/connect/expressions.proto | 23 +-- .../planner/LiteralValueProtoConverter.scala | 31 +-- python/pyspark/sql/connect/column.py | 213 +++-- python/pyspark/sql/connect/functions.py| 13 +- .../pyspark/sql/connect/proto/expressions_pb2.py | 86 ++--- .../pyspark/sql/connect/proto/expressions_pb2.pyi | 115 +-- .../sql/tests/connect/test_connect_column.py | 122 +++- .../connect/test_connect_column_expressions.py | 74 +-- 8 files changed, 365 insertions(+), 312 deletions(-) diff --git a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto index 6c0facbfeee..c906f15e0a6 100644 --- a/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto +++ b/connector/connect/common/src/main/protobuf/spark/connect/expressions.proto @@ -77,9 +77,7 @@ message Expression { int32 year_month_interval = 20; int64 day_time_interval = 21; - Array array = 22; - Struct struct = 23; - Map map = 24; + DataType typed_null = 22; } // whether the literal type should be treated as a nullable type. Applies to @@ -107,25 +105,6 @@ message Expression { int32 days = 2; int64 microseconds = 3; } - -message Struct { - // A possibly heterogeneously typed list of literals - repeated Literal fields = 1; -} - -message Array { - // A homogeneously typed list of literals - repeated Literal values = 1; -} - -message Map { - repeated Pair pairs = 1; - - message Pair { -Literal key = 1; -Literal value = 2; - } -} } // An unresolved attribute that is not explicitly bound to a specific column, but the column diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala index 5a54ad9ac64..46f6db64b8c 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala @@ -17,11 +17,8 @@ package org.apache.spark.sql.connect.planner -import scala.collection.JavaConverters._ - import org.apache.spark.connect.proto -import org.apache.spark.sql.catalyst.{expressions, InternalRow} -import org.apache.spark.sql.catalyst.expressions.{CreateArray, CreateMap, CreateStruct} +import org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String} @@ -99,20 +96,6 @@ object LiteralValueProtoConverter { case proto.Expression.Literal.LiteralTypeCase.DAY_TIME_INTERVAL => expressions.Literal(lit.getDayTimeInterval, DayTimeIntervalType()) - case proto.Expression.Literal.LiteralTypeCase.ARRAY => -val literals = lit.getArray.getValuesList.asScala.toArray.map(toCatalystExpression) -CreateArray(literals) - - case proto.Expression.Literal.LiteralTypeCase.STRUCT => -val literals = lit.getStruct.getFieldsList.asScala.toArray.map(toCatalystExpression) -CreateStruct(literals) - - case
[spark] branch master updated: [SPARK-41412][CONNECT][TESTS][FOLLOW-UP] Exclude binary casting to make the tests to pass with/without ANSI mode
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a75bc841db2 [SPARK-41412][CONNECT][TESTS][FOLLOW-UP] Exclude binary casting to make the tests to pass with/without ANSI mode a75bc841db2 is described below commit a75bc841db200a8d79ae8aabf3bff0308bcaadbb Author: Hyukjin Kwon AuthorDate: Wed Dec 14 08:35:04 2022 +0900 [SPARK-41412][CONNECT][TESTS][FOLLOW-UP] Exclude binary casting to make the tests to pass with/without ANSI mode ### What changes were proposed in this pull request? This PR is another followup of https://github.com/apache/spark/pull/39034 that, instead, make the tests to pass with/without ANSI mode. ### Why are the changes needed? Spark Connect uses isolated Spark session so setting the configuration in PySpark side does not take an effect. Therefore, the test still fails, see https://github.com/apache/spark/actions/runs/3681383627/jobs/6228030132. We should make the tests pass with/without ANSI mode for now. ### Does this PR introduce _any_ user-facing change? No, test-only ### How was this patch tested? Manually tested via: ```bash SPARK_ANSI_SQL_MODE=true ./python/run-tests --testnames 'pyspark.sql.tests.connect.test_connect_column' ``` Closes #39050 from HyukjinKwon/SPARK-41412. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../sql/tests/connect/test_connect_column.py | 35 ++ 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/python/pyspark/sql/tests/connect/test_connect_column.py b/python/pyspark/sql/tests/connect/test_connect_column.py index e6701231990..8b70b4d9a44 100644 --- a/python/pyspark/sql/tests/connect/test_connect_column.py +++ b/python/pyspark/sql/tests/connect/test_connect_column.py @@ -26,7 +26,6 @@ from pyspark.sql.types import ( DoubleType, LongType, DecimalType, -BinaryType, BooleanType, ) from pyspark.testing.connectutils import should_test_connect @@ -153,25 +152,21 @@ class SparkConnectTests(SparkConnectSQLTestCase): df.select(df.id.cast("string")).toPandas(), df2.select(df2.id.cast("string")).toPandas() ) -# Test if the arguments can be passed properly. -# Do not need to check individual behaviour for the ANSI mode thoroughly. -with self.sql_conf({"spark.sql.ansi.enabled": False}): -for x in [ -StringType(), -BinaryType(), -ShortType(), -IntegerType(), -LongType(), -FloatType(), -DoubleType(), -ByteType(), -DecimalType(10, 2), -BooleanType(), -DayTimeIntervalType(), -]: -self.assert_eq( -df.select(df.id.cast(x)).toPandas(), df2.select(df2.id.cast(x)).toPandas() -) +for x in [ +StringType(), +ShortType(), +IntegerType(), +LongType(), +FloatType(), +DoubleType(), +ByteType(), +DecimalType(10, 2), +BooleanType(), +DayTimeIntervalType(), +]: +self.assert_eq( +df.select(df.id.cast(x)).toPandas(), df2.select(df2.id.cast(x)).toPandas() +) def test_unsupported_functions(self): # SPARK-41225: Disable unsupported functions. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e29ada0c13e [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE` e29ada0c13e is described below commit e29ada0c13e71aaad0566ef67591a33d4c58fe2a Author: itholic AuthorDate: Tue Dec 13 21:48:11 2022 +0300 [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE` ### What changes were proposed in this pull request? This PR proposes to rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`. Also, show `sqlExprs` rather than `treeNode` which is more useful information to users. ### Why are the changes needed? The sub-error class name is duplicated with its main class, `UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY`. We should make the all error class name clear and briefly. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? ``` ./build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*” ``` Closes #38576 from itholic/SPARK-41062. Lead-authored-by: itholic Co-authored-by: Haejoon Lee <44108233+itho...@users.noreply.github.com> Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json| 10 +- .../apache/spark/sql/catalyst/analysis/CheckAnalysis.scala | 7 --- .../spark/sql/catalyst/analysis/ResolveSubquerySuite.scala | 13 - .../subquery/negative-cases/invalid-correlation.sql.out | 4 ++-- .../src/test/scala/org/apache/spark/sql/SubquerySuite.scala | 12 +--- 5 files changed, 24 insertions(+), 22 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index 25362d5893f..e1df3db4291 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -1471,6 +1471,11 @@ "A correlated outer name reference within a subquery expression body was not found in the enclosing query: " ] }, + "CORRELATED_REFERENCE" : { +"message" : [ + "Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: " +] + }, "LATERAL_JOIN_CONDITION_NON_DETERMINISTIC" : { "message" : [ "Lateral join condition cannot be non-deterministic: " @@ -1496,11 +1501,6 @@ "Non-deterministic lateral subqueries are not supported when joining with outer relations that produce more than one row" ] }, - "UNSUPPORTED_CORRELATED_REFERENCE" : { -"message" : [ - "Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses" -] - }, "UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE" : { "message" : [ "Correlated column reference '' cannot be type" diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index e7e153a319d..5303364710c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -1089,11 +1089,12 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB // 2. Expressions containing outer references on plan nodes other than allowed operators. def failOnInvalidOuterReference(p: LogicalPlan): Unit = { p.expressions.foreach(checkMixedReferencesInsideAggregateExpr) - if (!canHostOuter(p) && p.expressions.exists(containsOuter)) { + val exprs = stripOuterReferences(p.expressions.filter(expr => containsOuter(expr))) + if (!canHostOuter(p) && !exprs.isEmpty) { p.failAnalysis( errorClass = - "UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE", - messageParameters = Map("treeNode" -> planToString(p))) +"UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.CORRELATED_REFERENCE", + messageParameters = Map("sqlExprs" -> exprs.map(toSQLExpr).mkString(","))) } } diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala index 577f663d8b1..7b99153acf9 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala @@ -51,11 +51,14 @@ class
[spark] branch master updated: [SPARK-41482][BUILD] Upgrade dropwizard metrics 4.2.13
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e2474f63f1b [SPARK-41482][BUILD] Upgrade dropwizard metrics 4.2.13 e2474f63f1b is described below commit e2474f63f1b588f22279ed4e51b50924ecae9e86 Author: yangjie01 AuthorDate: Tue Dec 13 10:38:17 2022 -0800 [SPARK-41482][BUILD] Upgrade dropwizard metrics 4.2.13 ### What changes were proposed in this pull request? This pr aims upgrade dropwizard metrics to 4.2.13. ### Why are the changes needed? The release notes as follows: - https://github.com/dropwizard/metrics/releases/tag/v4.2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions Closes #39026 from LuciferYang/metrics-4213. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-2-hive-2.3 | 10 +- dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +- pom.xml | 2 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3 index ae7cc9d592c..40741e1d75b 100644 --- a/dev/deps/spark-deps-hadoop-2-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-2-hive-2.3 @@ -194,11 +194,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar -metrics-core/4.2.12//metrics-core-4.2.12.jar -metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar -metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar -metrics-json/4.2.12//metrics-json-4.2.12.jar -metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar +metrics-core/4.2.13//metrics-core-4.2.13.jar +metrics-graphite/4.2.13//metrics-graphite-4.2.13.jar +metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar +metrics-json/4.2.13//metrics-json-4.2.13.jar +metrics-jvm/4.2.13//metrics-jvm-4.2.13.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index f70abedd34b..162816bdc39 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -178,11 +178,11 @@ log4j-slf4j2-impl/2.19.0//log4j-slf4j2-impl-2.19.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar -metrics-core/4.2.12//metrics-core-4.2.12.jar -metrics-graphite/4.2.12//metrics-graphite-4.2.12.jar -metrics-jmx/4.2.12//metrics-jmx-4.2.12.jar -metrics-json/4.2.12//metrics-json-4.2.12.jar -metrics-jvm/4.2.12//metrics-jvm-4.2.12.jar +metrics-core/4.2.13//metrics-core-4.2.13.jar +metrics-graphite/4.2.13//metrics-graphite-4.2.13.jar +metrics-jmx/4.2.13//metrics-jmx-4.2.13.jar +metrics-json/4.2.13//metrics-json-4.2.13.jar +metrics-jvm/4.2.13//metrics-jvm-4.2.13.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.84.Final//netty-all-4.1.84.Final.jar netty-buffer/4.1.84.Final//netty-buffer-4.1.84.Final.jar diff --git a/pom.xml b/pom.xml index da7c8eccfce..e5d8d2d06ba 100644 --- a/pom.xml +++ b/pom.xml @@ -151,7 +151,7 @@ If you changes codahale.metrics.version, you also need to change the link to metrics.dropwizard.io in docs/monitoring.md. --> -4.2.12 +4.2.13 1.11.1 1.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7e9b88bfceb -> d00771f5ee2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 7e9b88bfceb [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project add d00771f5ee2 [SPARK-39601][YARN][FOLLOWUP] YarnClusterSchedulerBackend should call super.stop() No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/YarnClusterSchedulerBackend.scala | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7e9b88bfceb [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project 7e9b88bfceb is described below commit 7e9b88bfceb86d3b32e82a86b672aab3c74def8c Author: Xinyi Yu AuthorDate: Wed Dec 14 00:14:06 2022 +0800 [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project ### What changes were proposed in this pull request? This PR implements a new feature: Implicit lateral column alias on `Project` case, controlled by `spark.sql.lateralColumnAlias.enableImplicitResolution` temporarily (default false now, but will turn on this conf once the feature is completely merged). Lateral column alias View https://issues.apache.org/jira/browse/SPARK-27561 for more details on lateral column alias. There are two main cases to support: LCA in Project, and LCA in Aggregate. ```sql -- LCA in Project. The base_salary references an attribute defined by a previous alias SELECT salary AS base_salary, base_salary + bonus AS total_salary FROM employee -- LCA in Aggregate. The avg_salary references an attribute defined by a previous alias SELECT dept, average(salary) AS avg_salary, avg_salary + average(bonus) FROM employee GROUP BY dept ``` This **implicit** lateral column alias (no explicit keyword, e.g. `lateral.base_salary`) should be supported. High level design This PR defines a new Resolution rule, `ResolveLateralColumnAlias` to resolve the implicit lateral column alias, covering the `Project` case. It introduces a new leaf node NamedExpression, `LateralColumnAliasReference`, as a placeholder used to hold a referenced that has been temporarily resolved as the reference to a lateral column alias. The whole process is generally divided into two phases: 1) recognize **resolved** lateral alias, wrap the attributes referencing them with `LateralColumnAliasReference`. 2) when the whole operator is resolved, unwrap `LateralColumnAliasReference`. For Project, it further resolves the attributes and push down the referenced lateral aliases to the new Project. For example: ``` // Before Project [age AS a, 'a + 1] +- Child // After phase 1 Project [age AS a, lateralalias(a) + 1] +- Child // After phase 2 Project [a, a + 1] +- Project [child output, age AS a] +- Child ``` Resolution order Given this new rule, the name resolution order will be (higher -> lower): ``` local table column > local metadata attribute > local lateral column alias > all others (outer reference of subquery, parameters of SQL UDF, ..) ``` There is a recent refactor that moves the creation of `OuterReference` in the Resolution batch: https://github.com/apache/spark/pull/38851. Because lateral column alias has higher resolution priority than outer reference, it will try to resolve an `OuterReference` using lateral column alias, similar as an `UnresolvedAttribute`. If success, it strips `OuterReference` and also wraps it with `LateralColumnAliasReference`. ### Why are the changes needed? The lateral column alias is a popular feature wanted for a long time. It is supported by lots of other database vendors (Redshift, snowflake, etc) and provides a better user experience. ### Does this PR introduce _any_ user-facing change? Yes, as shown in the above example, it will be able to resolve lateral column alias. I will write the migration guide or release note when most PRs of this feature are merged. ### How was this patch tested? Existing tests and newly added tests. Closes #38776 from anchovYu/SPARK-27561-refactor. Authored-by: Xinyi Yu Signed-off-by: Wenchen Fan --- core/src/main/resources/error/error-classes.json | 6 + .../sql/catalyst/expressions/AttributeMap.scala| 3 +- .../sql/catalyst/expressions/AttributeMap.scala| 3 + .../spark/sql/catalyst/analysis/Analyzer.scala | 119 +++- .../sql/catalyst/analysis/CheckAnalysis.scala | 25 +- .../ResolveLateralColumnAliasReference.scala | 135 + .../catalyst/expressions/namedExpressions.scala| 33 +++ .../spark/sql/catalyst/expressions/subquery.scala | 9 +- .../sql/catalyst/rules/RuleIdCollection.scala | 2 + .../spark/sql/catalyst/trees/TreePatterns.scala| 1 + .../spark/sql/errors/QueryCompilationErrors.scala | 19 ++ .../org/apache/spark/sql/internal/SQLConf.scala| 11 + .../apache/spark/sql/LateralColumnAliasSuite.scala | 327 + 13 files changed, 686 insertions(+), 7 deletions(-) diff --git
[spark] branch branch-3.2 updated: [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new c6bd8207673 [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa c6bd8207673 is described below commit c6bd82076739abc694ab5327f000d6fb1eb9211c Author: Hyukjin Kwon AuthorDate: Tue Dec 13 23:34:09 2022 +0900 [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa This PR is a followup of https://github.com/apache/spark/pull/38876 that excludes BlockManagerMessages.RegisterBlockManager in MiMa compatibility check. It fails in MiMa check presumably with Scala 2.13 in other branches. Should be safer to exclude them all in the affected branches. No, dev-only. Filters copied from error messages. Will monitor the build in other branches. Closes #39052 from HyukjinKwon/SPARK-41360-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit a2ceff29f9d1c0133fa0c8274fa84c43106e90f0) Signed-off-by: Hyukjin Kwon --- project/MimaExcludes.scala | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 7957062f332..07add4ce469 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -72,7 +72,13 @@ object MimaExcludes { ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions"), // [SPARK-37391][SQL] JdbcConnectionProvider tells if it modifies security context - ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.jdbc.JdbcConnectionProvider.modifiesSecurityContext") + ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.jdbc.JdbcConnectionProvider.modifiesSecurityContext"), + +// [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.copy"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.this"), + ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.storage.BlockManagerMessages$RegisterBlockManager$"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.apply") ) // Exclude rules for 3.1.x - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 9f7baaec5e2 [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa 9f7baaec5e2 is described below commit 9f7baaec5e2850fbd51a146b52309189bf83379c Author: Hyukjin Kwon AuthorDate: Tue Dec 13 23:34:09 2022 +0900 [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa This PR is a followup of https://github.com/apache/spark/pull/38876 that excludes BlockManagerMessages.RegisterBlockManager in MiMa compatibility check. It fails in MiMa check presumably with Scala 2.13 in other branches. Should be safer to exclude them all in the affected branches. No, dev-only. Filters copied from error messages. Will monitor the build in other branches. Closes #39052 from HyukjinKwon/SPARK-41360-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon (cherry picked from commit a2ceff29f9d1c0133fa0c8274fa84c43106e90f0) Signed-off-by: Hyukjin Kwon --- project/MimaExcludes.scala | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index 8f3bd43ec65..ae8aa7e5cb3 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -68,7 +68,13 @@ object MimaExcludes { // [SPARK-38908][SQL] Provide query context in runtime error of Casting from String to // Number/Date/Timestamp/Boolean - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.types.Decimal.fromStringANSI"), + +// [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.copy"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.this"), + ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.storage.BlockManagerMessages$RegisterBlockManager$"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.apply") ) // Exclude rules for 3.2.x from 3.1.1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a2ceff29f9d [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa a2ceff29f9d is described below commit a2ceff29f9d1c0133fa0c8274fa84c43106e90f0 Author: Hyukjin Kwon AuthorDate: Tue Dec 13 23:34:09 2022 +0900 [SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.RegisterBlockManager in MiMa ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/38876 that excludes BlockManagerMessages.RegisterBlockManager in MiMa compatibility check. ### Why are the changes needed? It fails in MiMa check presumably with Scala 2.13 in other branches. Should be safer to exclude them all in the affected branches. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Filters copied from error messages. Will monitor the build in other branches. Closes #39052 from HyukjinKwon/SPARK-41360-followup. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- project/MimaExcludes.scala | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index eed79d1f204..7ec4ef37a0d 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -123,7 +123,13 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.streaming.StreamingQueryException.this"), // [SPARK-41180][SQL] Reuse INVALID_SCHEMA instead of _LEGACY_ERROR_TEMP_1227 - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.types.DataType.parseTypeWithFallback") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.types.DataType.parseTypeWithFallback"), + +// [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.copy"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.this"), + ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.storage.BlockManagerMessages$RegisterBlockManager$"), + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.storage.BlockManagerMessages#RegisterBlockManager.apply") ) // Defulat exclude rules - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e857b7ad1c7 [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down e857b7ad1c7 is described below commit e857b7ad1c78c57d06436e387473d83e61293c7c Author: Cheng Pan AuthorDate: Tue Dec 13 08:18:08 2022 -0600 [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down ### What changes were proposed in this pull request? Treating container `AllocationFailure` as not "exitCausedByApp" when driver is shutting down. The approach is suggested at https://github.com/apache/spark/pull/36991#discussion_r915948343 ### Why are the changes needed? I observed some Spark Applications successfully completed all jobs but failed during the shutting down phase w/ reason: Max number of executor failures (16) reached, the timeline is Driver - Job success, Spark starts shutting down procedure. ``` 2022-06-23 19:50:55 CST AbstractConnector INFO - Stopped Spark74e9431b{HTTP/1.1, (http/1.1)}{0.0.0.0:0} 2022-06-23 19:50:55 CST SparkUI INFO - Stopped Spark web UI at http://hadoop2627.xxx.org:28446 2022-06-23 19:50:55 CST YarnClusterSchedulerBackend INFO - Shutting down all executors ``` Driver - A container allocate successful during shutting down phase. ``` 2022-06-23 19:52:21 CST YarnAllocator INFO - Launching container container_e94_1649986670278_7743380_02_25 on host hadoop4388.xxx.org for executor with ID 24 for ResourceProfile Id 0 ``` Executor - The executor can not connect to driver endpoint because driver already stopped the endpoint. ``` Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:393) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend$.main(YarnCoarseGrainedExecutorBackend.scala:81) at org.apache.spark.executor.YarnCoarseGrainedExecutorBackend.main(YarnCoarseGrainedExecutorBackend.scala) Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$9(CoarseGrainedExecutorBackend.scala:413) at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23) at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) at scala.collection.immutable.Range.foreach(Range.scala:158) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:411) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) ... 4 more Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: Cannot find endpoint: spark://CoarseGrainedSchedulerhadoop2627.xxx.org:21956 at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1(NettyRpcEnv.scala:148) at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$asyncSetupEndpointRefByURI$1$adapted(NettyRpcEnv.scala:144) at scala.concurrent.Future.$anonfun$flatMap$1(Future.scala:307) at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:41) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at org.apache.spark.util.ThreadUtils$$anon$1.execute(ThreadUtils.scala:99) at scala.concurrent.impl.ExecutionContextImpl$$anon$4.execute(ExecutionContextImpl.scala:138) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288) ```
[spark] branch master updated (0e2d604fd33 -> 3809ccdca6e)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 0e2d604fd33 [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic add 3809ccdca6e [SPARK-41478][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1234 No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 10 +- .../spark/sql/errors/QueryCompilationErrors.scala | 4 ++-- .../spark/sql/StatisticsCollectionSuite.scala | 23 +- .../apache/spark/sql/execution/SQLViewSuite.scala | 11 +++ 4 files changed, 28 insertions(+), 20 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0e2d604fd33 [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic 0e2d604fd33 is described below commit 0e2d604fd33c8236cfa8ae243eeaec42d3176a06 Author: panbingkun AuthorDate: Tue Dec 13 14:02:36 2022 +0300 [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic ### What changes were proposed in this pull request? The pr aims to refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? Update existed UT. Pass GA. Closes #38937 from panbingkun/SPARK-41406. Authored-by: panbingkun Signed-off-by: Max Gekk --- core/src/main/resources/error/error-classes.json | 2 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 4 +- .../plans/logical/basicLogicalOperators.scala | 4 +- .../resources/sql-tests/results/except-all.sql.out | 6 +- .../sql-tests/results/intersect-all.sql.out| 6 +- .../native/widenSetOperationTypes.sql.out | 140 ++--- .../sql-tests/results/udf/udf-except-all.sql.out | 6 +- .../results/udf/udf-intersect-all.sql.out | 6 +- .../spark/sql/DataFrameSetOperationsSuite.scala| 9 +- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 22 +++- 10 files changed, 110 insertions(+), 95 deletions(-) diff --git a/core/src/main/resources/error/error-classes.json b/core/src/main/resources/error/error-classes.json index e76328e970d..6faaf0af35f 100644 --- a/core/src/main/resources/error/error-classes.json +++ b/core/src/main/resources/error/error-classes.json @@ -943,7 +943,7 @@ }, "NUM_COLUMNS_MISMATCH" : { "message" : [ - " can only be performed on tables with the same number of columns, but the first table has columns and the table has columns." + " can only be performed on inputs with the same number of columns, but the first input has columns and the input has columns." ] }, "ORDER_BY_POS_OUT_OF_RANGE" : { diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala index 12dac5c632a..be812adaaa1 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala @@ -552,7 +552,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB errorClass = "NUM_COLUMNS_MISMATCH", messageParameters = Map( "operator" -> toSQLStmt(operator.nodeName), -"refNumColumns" -> ref.length.toString, +"firstNumColumns" -> ref.length.toString, "invalidOrdinalNum" -> ordinalNumber(ti + 1), "invalidNumColumns" -> child.output.length.toString)) } @@ -565,7 +565,7 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog with QueryErrorsB e.failAnalysis( errorClass = "_LEGACY_ERROR_TEMP_2430", messageParameters = Map( - "operator" -> operator.nodeName, + "operator" -> toSQLStmt(operator.nodeName), "ci" -> ordinalNumber(ci), "ti" -> ordinalNumber(ti + 1), "dt1" -> dt1.catalogString, diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala index 60586e4166c..878ad91c088 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala @@ -342,7 +342,7 @@ case class Intersect( right: LogicalPlan, isAll: Boolean) extends SetOperation(left, right) { - override def nodeName: String = getClass.getSimpleName + ( if ( isAll ) "All" else "" ) + override def nodeName: String = getClass.getSimpleName + ( if ( isAll ) " All" else "" ) final override val nodePatterns: Seq[TreePattern] = Seq(INTERSECT) @@ -372,7 +372,7 @@ case class Except( left: LogicalPlan, right: LogicalPlan, isAll: Boolean) extends SetOperation(left, right) { - override def nodeName: String =
[spark] branch master updated: [SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in EquivalentExpressions
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 27f4d1ef848 [SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in EquivalentExpressions 27f4d1ef848 is described below commit 27f4d1ef848caf357faaf90d7ee4f625e0a3b5d3 Author: Peter Toth AuthorDate: Tue Dec 13 17:05:08 2022 +0800 [SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in EquivalentExpressions ### What changes were proposed in this pull request? This is a follow-up PR to https://github.com/apache/spark/pull/39010 to handle `NamedLambdaVariable`s too. ### Why are the changes needed? To avoid possible issues with higer-order functions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. Closes #39046 from peter-toth/SPARK-41468-fix-planexpressions-in-equivalentexpressions-follow-up. Authored-by: Peter Toth Signed-off-by: Wenchen Fan --- .../spark/sql/catalyst/expressions/EquivalentExpressions.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala index 3ffd9f9d887..330d66a21be 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala @@ -144,9 +144,10 @@ class EquivalentExpressions { private def supportedExpression(e: Expression) = { !e.exists { - // `LambdaVariable` is usually used as a loop variable, which can't be evaluated ahead of the - // loop. So we can't evaluate sub-expressions containing `LambdaVariable` at the beginning. + // `LambdaVariable` is usually used as a loop variable and `NamedLambdaVariable` is used in + // higher-order functions, which can't be evaluated ahead of the execution. case _: LambdaVariable => true + case _: NamedLambdaVariable => true // `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor, // can cause error like NPE. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org