[spark] branch master updated: [SPARK-38739][SQL][TESTS] Test the error class: INVALID_SYNTAX_FOR_CAST
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7221ea31b6b [SPARK-38739][SQL][TESTS] Test the error class: INVALID_SYNTAX_FOR_CAST 7221ea31b6b is described below commit 7221ea31b6bbad0d87b22e5413b8979bee56321c Author: panbingkun AuthorDate: Fri May 13 23:20:42 2022 +0300 [SPARK-38739][SQL][TESTS] Test the error class: INVALID_SYNTAX_FOR_CAST ## What changes were proposed in this pull request? This PR aims to add a test for the error class INVALID_SYNTAX_FOR_CAST to `QueryExecutionErrors`. Also the method `invalidInputSyntaxForNumericError` is removed as no longer used. ### Why are the changes needed? The changes improve test coverage, and document expected error messages in tests. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? By running new test: ``` $ build/sbt "test:testOnly *QueryExecutionAnsiErrorsSuite" ``` Closes #36493 from panbingkun/SPARK-38739. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../apache/spark/sql/errors/QueryExecutionErrors.scala | 9 + .../sql/errors/QueryExecutionAnsiErrorsSuite.scala | 17 - 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala index 447a820a128..e687417d7cc 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala @@ -115,17 +115,10 @@ object QueryExecutionErrors extends QueryErrorsBase { context)) } - def invalidInputSyntaxForNumericError( - e: NumberFormatException, - errorContext: String): NumberFormatException = { -new NumberFormatException(s"${e.getMessage}. To return NULL instead, use 'try_cast'. " + - s"If necessary set ${SQLConf.ANSI_ENABLED.key} to false to bypass this error." + errorContext) - } - def invalidInputSyntaxForNumericError( to: DataType, s: UTF8String, - errorContext: String): NumberFormatException = { + errorContext: String): SparkNumberFormatException = { new SparkNumberFormatException(errorClass = "INVALID_SYNTAX_FOR_CAST", messageParameters = Array(toSQLType(to), toSQLValue(s, StringType), SQLConf.ANSI_ENABLED.key, errorContext)) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala index 78b78f99ab0..8aef4c6f345 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala @@ -16,7 +16,7 @@ */ package org.apache.spark.sql.errors -import org.apache.spark.{SparkArithmeticException, SparkArrayIndexOutOfBoundsException, SparkConf, SparkDateTimeException, SparkNoSuchElementException} +import org.apache.spark.{SparkArithmeticException, SparkArrayIndexOutOfBoundsException, SparkConf, SparkDateTimeException, SparkNoSuchElementException, SparkNumberFormatException} import org.apache.spark.sql.QueryTest import org.apache.spark.sql.internal.SQLConf @@ -124,4 +124,19 @@ class QueryExecutionAnsiErrorsSuite extends QueryTest with QueryErrorsSuiteBase |""".stripMargin ) } + + test("INVALID_SYNTAX_FOR_CAST: cast string to double") { +checkErrorClass( + exception = intercept[SparkNumberFormatException] { +sql("select CAST('xe23' AS DOUBLE)").collect() + }, + errorClass = "INVALID_SYNTAX_FOR_CAST", + msg = """Invalid input syntax for type "DOUBLE": 'xe23'. """ + +"""To return NULL instead, use 'try_cast'. If necessary set """ + +"""spark.sql.ansi.enabled to false to bypass this error. + |== SQL(line 1, position 7) == + |select CAST('xe23' AS DOUBLE) + | ^^ + |""".stripMargin) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d9049533d8 -> 23d0879a585)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3d9049533d8 [SPARK-39181][SQL] `SessionCatalog.reset` should not drop temp functions twice add 23d0879a585 [SPARK-39182][BUILD] Upgrade to Arrow 8.0.0 No new revisions were added by this update. Summary of changes: dev/deps/spark-deps-hadoop-2-hive-2.3 | 8 dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 pom.xml | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39181][SQL] `SessionCatalog.reset` should not drop temp functions twice
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3d9049533d8 [SPARK-39181][SQL] `SessionCatalog.reset` should not drop temp functions twice 3d9049533d8 is described below commit 3d9049533d8bc75cc2c4832dc2fd6f8ac53efe21 Author: Wenchen Fan AuthorDate: Fri May 13 10:21:15 2022 -0700 [SPARK-39181][SQL] `SessionCatalog.reset` should not drop temp functions twice ### What changes were proposed in this pull request? `SessionCatalog.reset` is a test only API and it drops the temp functions twice currently: 1. once with `listFunctions(DEFAULT_DATABASE)...` which list temp functions as well 2. once with `functionRegistry.clear()` This PR changes `listFunctions` to `externalCatalog.listFunctions` which only list permanent functions. ### Why are the changes needed? code simplification and probably makes tests run faster ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #36542 from cloud-fan/minor. Authored-by: Wenchen Fan Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/catalyst/catalog/SessionCatalog.scala| 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 6b7f8a207d6..d6c80f98bf7 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -1818,13 +1818,10 @@ class SessionCatalog( listTables(DEFAULT_DATABASE).foreach { table => dropTable(table, ignoreIfNotExists = false, purge = false) } -listFunctions(DEFAULT_DATABASE).map(_._1).foreach { func => - if (func.database.isDefined) { -dropFunction(func, ignoreIfNotExists = false) - } else { -dropTempFunction(func.funcName, ignoreIfNotExists = false) - } -} +// Temp functions are dropped below, we only need to drop permanent functions here. +externalCatalog.listFunctions(DEFAULT_DATABASE, "*").map { f => + FunctionIdentifier(f, Some(DEFAULT_DATABASE)) +}.foreach(dropFunction(_, ignoreIfNotExists = false)) clearTempTables() globalTempViewManager.clear() functionRegistry.clear() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38751][SQL][TESTS] Test the error class: UNRECOGNIZED_SQL_TYPE
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bbf3a2eafa0 [SPARK-38751][SQL][TESTS] Test the error class: UNRECOGNIZED_SQL_TYPE bbf3a2eafa0 is described below commit bbf3a2eafa004f712799261ef883dcc457a072fd Author: panbingkun AuthorDate: Fri May 13 19:29:02 2022 +0300 [SPARK-38751][SQL][TESTS] Test the error class: UNRECOGNIZED_SQL_TYPE ## What changes were proposed in this pull request? This PR aims to add a test for the error class UNRECOGNIZED_SQL_TYPE to `QueryExecutionErrorsSuite`. ### Why are the changes needed? The changes improve test coverage, and document expected error messages in tests. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running new test: ``` $ build/sbt "sql/testOnly *QueryExecutionErrorsSuite*" ``` Closes #36463 from panbingkun/SPARK-38751. Authored-by: panbingkun Signed-off-by: Max Gekk --- .../sql/errors/QueryExecutionErrorsSuite.scala | 89 +- 1 file changed, 86 insertions(+), 3 deletions(-) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala index 7a5592c148a..cf1551298a8 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala @@ -19,23 +19,27 @@ package org.apache.spark.sql.errors import java.io.IOException import java.net.URL -import java.util.{Locale, ServiceConfigurationError} +import java.sql.{Connection, DriverManager, PreparedStatement, ResultSet, ResultSetMetaData} +import java.util.{Locale, Properties, ServiceConfigurationError} import org.apache.hadoop.fs.{LocalFileSystem, Path} import org.apache.hadoop.fs.permission.FsPermission +import org.mockito.Mockito.{mock, when} import test.org.apache.spark.sql.connector.JavaSimpleWritableDataSource -import org.apache.spark.{SparkArithmeticException, SparkClassNotFoundException, SparkException, SparkIllegalArgumentException, SparkIllegalStateException, SparkRuntimeException, SparkSecurityException, SparkUnsupportedOperationException, SparkUpgradeException} +import org.apache.spark.{SparkArithmeticException, SparkClassNotFoundException, SparkException, SparkIllegalArgumentException, SparkIllegalStateException, SparkRuntimeException, SparkSecurityException, SparkSQLException, SparkUnsupportedOperationException, SparkUpgradeException} import org.apache.spark.sql.{AnalysisException, DataFrame, QueryTest, SaveMode} import org.apache.spark.sql.catalyst.util.BadRecordException import org.apache.spark.sql.connector.SimpleWritableDataSource import org.apache.spark.sql.execution.QueryExecutionException +import org.apache.spark.sql.execution.datasources.jdbc.{DriverRegistry, JDBCOptions} import org.apache.spark.sql.execution.datasources.orc.OrcTest import org.apache.spark.sql.execution.datasources.parquet.ParquetTest import org.apache.spark.sql.functions.{lit, lower, struct, sum, udf} import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.LegacyBehaviorPolicy.EXCEPTION -import org.apache.spark.sql.types.{DecimalType, StructType, TimestampType} +import org.apache.spark.sql.jdbc.{JdbcDialect, JdbcDialects} +import org.apache.spark.sql.types.{DataType, DecimalType, MetadataBuilder, StructType, TimestampType} import org.apache.spark.sql.util.ArrowUtils import org.apache.spark.util.Utils @@ -514,6 +518,85 @@ class QueryExecutionErrorsSuite "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister") } } + + test("UNRECOGNIZED_SQL_TYPE: unrecognized SQL type -100") { +Utils.classForName("org.h2.Driver") + +val properties = new Properties() +properties.setProperty("user", "testUser") +properties.setProperty("password", "testPass") + +val url = "jdbc:h2:mem:testdb0" +val urlWithUserAndPass = "jdbc:h2:mem:testdb0;user=testUser;password=testPass" +val tableName = "test.table1" +val unrecognizedColumnType = -100 + +var conn: java.sql.Connection = null +try { + conn = DriverManager.getConnection(url, properties) + conn.prepareStatement("create schema test").executeUpdate() + conn.commit() + + conn.prepareStatement(s"create table $tableName (a INT)").executeUpdate() + conn.prepareStatement( +s"insert into $tableName values (1)").executeUpdate() + conn.commit() +} finally { + if (null != conn) { +conn.close() + } +} + +val testH2DialectUnrecognizedSQLType = new JdbcDialect { + override def canHandle(url: String):
[spark] branch master updated: [SPARK-39138][SQL] Add ANSI general value specification and function - user
This is an automated email from the ASF dual-hosted git repository. yao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new aabe3a75454 [SPARK-39138][SQL] Add ANSI general value specification and function - user aabe3a75454 is described below commit aabe3a75454462c9ce665a3d947a7edb91f37d13 Author: Kent Yao AuthorDate: Sat May 14 00:03:23 2022 +0800 [SPARK-39138][SQL] Add ANSI general value specification and function - user ### What changes were proposed in this pull request? Add ANSI general value specification and function - user ### Why are the changes needed? According to ANSI SQL, ``` CURRENT_USER and USER are semantically the same ``` USER is also supported by other systems like MySQL, PG, hive, etc. ### Does this PR introduce _any_ user-facing change? new function added for ansi mode, and if enforceReservedKeywords, USER always reserved otherwise, it will be resolved as a literal if no attribute matches. ### How was this patch tested? new tests Closes #36497 from yaooqinn/SPARK-39138. Authored-by: Kent Yao Signed-off-by: Kent Yao --- .../spark/sql/catalyst/parser/SqlBaseParser.g4 | 2 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 1 + .../sql/catalyst/analysis/FunctionRegistry.scala | 1 + .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../sql-functions/sql-expression-schema.md | 1 + .../org/apache/spark/sql/MiscFunctionsSuite.scala | 15 +++-- .../ThriftServerWithSparkContextSuite.scala| 25 ++ 7 files changed, 30 insertions(+), 17 deletions(-) diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 index 7eace05b6b2..ed57e9062c1 100644 --- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 +++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 @@ -816,7 +816,7 @@ datetimeUnit ; primaryExpression -: name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER) #currentLike +: name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER | USER) #currentLike | name=(TIMESTAMPADD | DATEADD) LEFT_PAREN unit=datetimeUnit COMMA unitsAmount=valueExpression COMMA timestamp=valueExpression RIGHT_PAREN #timestampadd | name=(TIMESTAMPDIFF | DATEDIFF) LEFT_PAREN unit=datetimeUnit COMMA startTimestamp=valueExpression COMMA endTimestamp=valueExpression RIGHT_PAREN #timestampdiff | CASE whenClause+ (ELSE elseExpression=expression)? END #searchedCase diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala index 817a62fd1d8..20c1756ef4e 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala @@ -1682,6 +1682,7 @@ class Analyzer(override val catalogManager: CatalogManager) (CurrentDate().prettyName, () => CurrentDate(), toPrettySQL(_)), (CurrentTimestamp().prettyName, () => CurrentTimestamp(), toPrettySQL(_)), (CurrentUser().prettyName, () => CurrentUser(), toPrettySQL), +("user", () => CurrentUser(), toPrettySQL), (VirtualColumn.hiveGroupingIdName, () => GroupingID(Nil), _ => VirtualColumn.hiveGroupingIdName) ) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala index 50f376c0ce6..5084753d2d4 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala @@ -714,6 +714,7 @@ object FunctionRegistry { expression[CurrentDatabase]("current_database"), expression[CurrentCatalog]("current_catalog"), expression[CurrentUser]("current_user"), +expression[CurrentUser]("user", setAlias = true), expression[CallMethodViaReflection]("reflect"), expression[CallMethodViaReflection]("java_method", true), expression[SparkVersion]("version"), diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala index e6f7dba863b..ff3b99fb815 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala +++
[spark] branch branch-3.3 updated: [SPARK-39157][SQL] H2Dialect should override getJDBCType so as make the data type is correct
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 30834b847e7 [SPARK-39157][SQL] H2Dialect should override getJDBCType so as make the data type is correct 30834b847e7 is described below commit 30834b847e7577cf694558d43fb618fc0b1eb09e Author: Jiaan Geng AuthorDate: Fri May 13 22:01:04 2022 +0800 [SPARK-39157][SQL] H2Dialect should override getJDBCType so as make the data type is correct ### What changes were proposed in this pull request? Currently, `H2Dialect` not implement `getJDBCType` of `JdbcDialect`, so the DS V2 push-down will throw exception show below: ``` Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 13) (jiaan-gengdembp executor driver): org.h2.jdbc.JdbcSQLNonTransientException: Unknown data type: "STRING"; SQL statement: SELECT "DEPT","NAME","SALARY","BONUS","IS_MANAGER" FROM "test"."employee" WHERE ("BONUS" IS NOT NULL) AND ("DEPT" IS NOT NULL) AND (CAST("BONUS" AS string) LIKE '%30%') AND (CAST("DEPT" AS byte) > 1) AND (CAST("DEPT" AS short) > 1) AND (CAST("BONUS" AS decimal(20,2)) > 1200.00)[50004-210] ``` H2Dialect should implement `getJDBCType` of `JdbcDialect`. ### Why are the changes needed? make the H2 data type is correct. ### Does this PR introduce _any_ user-facing change? 'Yes'. Fix a bug for `H2Dialect`. ### How was this patch tested? New tests. Closes #36516 from beliefer/SPARK-39157. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan (cherry picked from commit fa3f096e02d408fbeab5f69af451ef8bc8f5b3db) Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/jdbc/H2Dialect.scala | 13 - .../scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala | 17 + 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 0aa971c0d3a..56cadbe8e2c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -17,7 +17,7 @@ package org.apache.spark.sql.jdbc -import java.sql.SQLException +import java.sql.{SQLException, Types} import java.util.Locale import scala.util.control.NonFatal @@ -27,6 +27,8 @@ import org.apache.spark.sql.catalyst.analysis.{NoSuchNamespaceException, NoSuchT import org.apache.spark.sql.connector.expressions.Expression import org.apache.spark.sql.connector.expressions.aggregate.{AggregateFunc, GeneralAggregateFunc} import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils +import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, DecimalType, ShortType, StringType} private object H2Dialect extends JdbcDialect { override def canHandle(url: String): Boolean = @@ -90,6 +92,15 @@ private object H2Dialect extends JdbcDialect { ) } + override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { +case StringType => Option(JdbcType("CLOB", Types.CLOB)) +case BooleanType => Some(JdbcType("BOOLEAN", Types.BOOLEAN)) +case ShortType | ByteType => Some(JdbcType("SMALLINT", Types.SMALLINT)) +case t: DecimalType => Some( + JdbcType(s"NUMERIC(${t.precision},${t.scale})", Types.NUMERIC)) +case _ => JdbcUtils.getCommonJDBCType(dt) + } + override def classifyException(message: String, e: Throwable): AnalysisException = { e match { case exception: SQLException => diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index b6f36b912f8..91526cef507 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -466,6 +466,23 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel checkFiltersRemoved(df7, false) checkPushedInfo(df7, "PushedFilters: [DEPT IS NOT NULL]") checkAnswer(df7, Seq(Row(6, "jen", 12000, 1200, true))) + +val df8 = sql( + """ +|SELECT * FROM h2.test.employee +|WHERE cast(bonus as string) like '%30%' +|AND cast(dept as byte) > 1 +|AND cast(dept as short) > 1 +|AND cast(bonus as decimal(20, 2)) > 1200""".stripMargin) +checkFiltersRemoved(df8, ansiMode) +val expectedPlanFragment8 = if (ansiMode) { + "PushedFilters: [BONUS IS NOT NULL, DEPT IS NOT NULL, " + +"CAST(BONUS AS string)
[spark] branch master updated (d7317b03e97 -> fa3f096e02d)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from d7317b03e97 [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack add fa3f096e02d [SPARK-39157][SQL] H2Dialect should override getJDBCType so as make the data type is correct No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/jdbc/H2Dialect.scala | 13 - .../scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala | 17 + 2 files changed, 29 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new e743e68ce62 [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack e743e68ce62 is described below commit e743e68ce62e18ced6c49a22f5d101c72b7bfbe2 Author: Angerszh AuthorDate: Fri May 13 16:47:11 2022 +0300 [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack ### What changes were proposed in this pull request? Our user meet an case when running broadcast, throw `SparkFatalException`, but in error stack, it don't show the error case. ### Why are the changes needed? Make exception more clear ### Does this PR introduce _any_ user-facing change? User can got root cause when application throw `SparkFatalException`. ### How was this patch tested? For ut ``` test("") { throw new SparkFatalException( new OutOfMemoryError("Not enough memory to build and broadcast the table to all " + "worker nodes. As a workaround, you can either disable broadcast by setting " + s"driver memory by setting ${SparkLauncher.DRIVER_MEMORY} to a higher value.") .initCause(null)) } ``` Before this pr: ``` [info] org.apache.spark.util.SparkFatalException: [info] at org.apache.spark.SparkContextSuite.$anonfun$new$1(SparkContextSuite.scala:59) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64) [info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) [info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:431) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) [info] at org.scalatest.Suite.run(Suite.scala:1112) ``` After this pr: ``` [info] org.apache.spark.util.SparkFatalException: java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting driver memory by setting spark.driver.memory to a higher value. [info] at org.apache.spark.SparkContextSuite.$anonfun$new$1(SparkContextSuite.scala:59) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at
[spark] branch master updated: [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d7317b03e97 [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack d7317b03e97 is described below commit d7317b03e975f8dc1a8c276dd0a931e00c478717 Author: Angerszh AuthorDate: Fri May 13 16:47:11 2022 +0300 [SPARK-39178][CORE] SparkFatalException should show root cause when print error stack ### What changes were proposed in this pull request? Our user meet an case when running broadcast, throw `SparkFatalException`, but in error stack, it don't show the error case. ### Why are the changes needed? Make exception more clear ### Does this PR introduce _any_ user-facing change? User can got root cause when application throw `SparkFatalException`. ### How was this patch tested? For ut ``` test("") { throw new SparkFatalException( new OutOfMemoryError("Not enough memory to build and broadcast the table to all " + "worker nodes. As a workaround, you can either disable broadcast by setting " + s"driver memory by setting ${SparkLauncher.DRIVER_MEMORY} to a higher value.") .initCause(null)) } ``` Before this pr: ``` [info] org.apache.spark.util.SparkFatalException: [info] at org.apache.spark.SparkContextSuite.$anonfun$new$1(SparkContextSuite.scala:59) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:64) [info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) [info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:64) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:431) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) [info] at org.scalatest.Suite.run(Suite.scala:1112) ``` After this pr: ``` [info] org.apache.spark.util.SparkFatalException: java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting driver memory by setting spark.driver.memory to a higher value. [info] at org.apache.spark.SparkContextSuite.$anonfun$new$1(SparkContextSuite.scala:59) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at
[spark] branch master updated: [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1afddf40743 [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off 1afddf40743 is described below commit 1afddf407436c3b315ec601fab5a4a1b2028e672 Author: Gengliang Wang AuthorDate: Fri May 13 21:45:06 2022 +0800 [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides query context for "map key not exists" runtime error when WSCG is off. ### Why are the changes needed? Enhance the runtime error query context for "map key not exists" runtime error. After changes, it works when the whole stage codegen is not available. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36538 from gengliangwang/fixMapKeyContext. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- .../catalyst/expressions/collectionOperations.scala | 6 ++ .../catalyst/expressions/complexTypeExtractors.scala | 13 ++--- .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 1b42ea5eb87..1bd934214f5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2261,6 +2261,12 @@ case class ElementAt( override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): ElementAt = copy(left = newLeft, right = newRight) + + override def initQueryContext(): String = if (failOnError) { +origin.context + } else { +"" + } } /** diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala index 45661c00c51..b84050c1837 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala @@ -339,7 +339,8 @@ trait GetArrayItemUtil { /** * Common trait for [[GetMapValue]] and [[ElementAt]]. */ -trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { +trait GetMapValueUtil + extends BinaryExpression with ImplicitCastInputTypes with SupportQueryContext { // todo: current search is O(n), improve it. def getValueEval( @@ -365,7 +366,7 @@ trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { if (!found) { if (failOnError) { -throw QueryExecutionErrors.mapKeyNotExistError(ordinal, keyType, origin.context) +throw QueryExecutionErrors.mapKeyNotExistError(ordinal, keyType, queryContext) } else { null } @@ -398,7 +399,7 @@ trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { } val keyJavaType = CodeGenerator.javaType(keyType) -lazy val errorContext = ctx.addReferenceObj("errCtx", origin.context) +lazy val errorContext = ctx.addReferenceObj("errCtx", queryContext) val keyDt = ctx.addReferenceObj("keyType", keyType, keyType.getClass.getName) nullSafeCodeGen(ctx, ev, (eval1, eval2) => { val keyNotFoundBranch = if (failOnError) { @@ -488,4 +489,10 @@ case class GetMapValue( override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): GetMapValue = copy(child = newLeft, key = newRight) + + override def initQueryContext(): String = if (failOnError) { +origin.context + } else { +"" + } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index c0f609ad817..0355bd90d04 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -4357,6 +4357,25 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } + test("SPARK-39177: Query context of getting map value should be serialized to executors" + +" when WSCG is off") { +withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false", + SQLConf.ANSI_ENABLED.key ->
[spark] branch branch-3.3 updated: [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 1a49de67e3f [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off 1a49de67e3f is described below commit 1a49de67e3fa0d25e84540313688cde82d6001df Author: Gengliang Wang AuthorDate: Fri May 13 21:45:06 2022 +0800 [SPARK-39177][SQL] Provide query context on map key not exists error when WSCG is off ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides query context for "map key not exists" runtime error when WSCG is off. ### Why are the changes needed? Enhance the runtime error query context for "map key not exists" runtime error. After changes, it works when the whole stage codegen is not available. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36538 from gengliangwang/fixMapKeyContext. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 1afddf407436c3b315ec601fab5a4a1b2028e672) Signed-off-by: Gengliang Wang --- .../catalyst/expressions/collectionOperations.scala | 6 ++ .../catalyst/expressions/complexTypeExtractors.scala | 13 ++--- .../scala/org/apache/spark/sql/SQLQuerySuite.scala| 19 +++ 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 1b42ea5eb87..1bd934214f5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2261,6 +2261,12 @@ case class ElementAt( override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): ElementAt = copy(left = newLeft, right = newRight) + + override def initQueryContext(): String = if (failOnError) { +origin.context + } else { +"" + } } /** diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala index 45661c00c51..b84050c1837 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala @@ -339,7 +339,8 @@ trait GetArrayItemUtil { /** * Common trait for [[GetMapValue]] and [[ElementAt]]. */ -trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { +trait GetMapValueUtil + extends BinaryExpression with ImplicitCastInputTypes with SupportQueryContext { // todo: current search is O(n), improve it. def getValueEval( @@ -365,7 +366,7 @@ trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { if (!found) { if (failOnError) { -throw QueryExecutionErrors.mapKeyNotExistError(ordinal, keyType, origin.context) +throw QueryExecutionErrors.mapKeyNotExistError(ordinal, keyType, queryContext) } else { null } @@ -398,7 +399,7 @@ trait GetMapValueUtil extends BinaryExpression with ImplicitCastInputTypes { } val keyJavaType = CodeGenerator.javaType(keyType) -lazy val errorContext = ctx.addReferenceObj("errCtx", origin.context) +lazy val errorContext = ctx.addReferenceObj("errCtx", queryContext) val keyDt = ctx.addReferenceObj("keyType", keyType, keyType.getClass.getName) nullSafeCodeGen(ctx, ev, (eval1, eval2) => { val keyNotFoundBranch = if (failOnError) { @@ -488,4 +489,10 @@ case class GetMapValue( override protected def withNewChildrenInternal( newLeft: Expression, newRight: Expression): GetMapValue = copy(child = newLeft, key = newRight) + + override def initQueryContext(): String = if (failOnError) { +origin.context + } else { +"" + } } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala index 68db57ea364..21ce009a907 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala @@ -4404,6 +4404,25 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } + test("SPARK-39177: Query context of getting map value should be serialized to executors" + +" when
[spark] branch branch-3.3 updated: [SPARK-39164][SQL][3.3] Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 1372f312052 [SPARK-39164][SQL][3.3] Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions 1372f312052 is described below commit 1372f312052dd0361e371e2ed63436f3e299c617 Author: Max Gekk AuthorDate: Fri May 13 16:43:53 2022 +0300 [SPARK-39164][SQL][3.3] Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions ### What changes were proposed in this pull request? In the PR, I propose to catch `java.lang.IllegalStateException` and `java.lang.AssertionError` (raised by asserts), and wrap them by Spark's exception w/ the `INTERNAL_ERROR` error class. The modification affects only actions so far. This PR affects the case of missing bucket file. After the changes, Spark throws `SparkException` w/ `INTERNAL_ERROR` instead of `IllegalStateException`. Since this is not Spark's illegal state, the exception should be replaced by another runtime exception. Created the ticket SPARK-39163 to fix this. This is a backport of https://github.com/apache/spark/pull/36500. ### Why are the changes needed? To improve user experience with Spark SQL and unify representation of internal errors by using error classes like for other errors. Usually, users shouldn't observe asserts and illegal states, but even if such situation happens, they should see errors in the same way as other errors (w/ error class `INTERNAL_ERROR`). ### Does this PR introduce _any_ user-facing change? Yes. At least, in one particular case, see the modified test suites and SPARK-39163. ### How was this patch tested? By running the affected test suites: ``` $ build/sbt "test:testOnly *.BucketedReadWithoutHiveSupportSuite" $ build/sbt "test:testOnly *.AdaptiveQueryExecSuite" $ build/sbt "test:testOnly *.WholeStageCodegenSuite" ``` Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit f5c3f0c228fef7808d1f927e134595ddd4d31723) Signed-off-by: Max Gekk Closes #36533 from MaxGekk/class-internal-error-3.3. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../main/scala/org/apache/spark/sql/Dataset.scala | 21 - .../spark/sql/execution/DataSourceScanExec.scala| 1 + .../org/apache/spark/sql/execution/subquery.scala | 1 + .../scala/org/apache/spark/sql/SubquerySuite.scala | 10 ++ .../sql/execution/WholeStageCodegenSuite.scala | 14 -- .../execution/adaptive/AdaptiveQueryExecSuite.scala | 9 ++--- .../spark/sql/sources/BucketedReadSuite.scala | 8 +--- 7 files changed, 43 insertions(+), 21 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala index 7d16a2f5eee..56f0e8978ec 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala @@ -27,7 +27,7 @@ import scala.util.control.NonFatal import org.apache.commons.lang3.StringUtils -import org.apache.spark.TaskContext +import org.apache.spark.{SparkException, SparkThrowable, TaskContext} import org.apache.spark.annotation.{DeveloperApi, Stable, Unstable} import org.apache.spark.api.java.JavaRDD import org.apache.spark.api.java.function._ @@ -3848,12 +3848,23 @@ class Dataset[T] private[sql]( /** * Wrap a Dataset action to track the QueryExecution and time cost, then report to the - * user-registered callback functions. + * user-registered callback functions, and also to convert asserts/illegal states to + * the internal error exception. */ private def withAction[U](name: String, qe: QueryExecution)(action: SparkPlan => U) = { -SQLExecution.withNewExecutionId(qe, Some(name)) { - qe.executedPlan.resetMetrics() - action(qe.executedPlan) +try { + SQLExecution.withNewExecutionId(qe, Some(name)) { +qe.executedPlan.resetMetrics() +action(qe.executedPlan) + } +} catch { + case e: SparkThrowable => throw e + case e @ (_: java.lang.IllegalStateException | _: java.lang.AssertionError) => +throw new SparkException( + errorClass = "INTERNAL_ERROR", + messageParameters = Array(s"""The "$name" action failed."""), + cause = e) + case e: Throwable => throw e } } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala index ac0f3af5725..1ec93a614b7 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala +++
[spark] branch branch-3.3 updated: [SPARK-39165][SQL][3.3] Replace `sys.error` by `IllegalStateException`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new c2bd7bac76a [SPARK-39165][SQL][3.3] Replace `sys.error` by `IllegalStateException` c2bd7bac76a is described below commit c2bd7bac76a5cf7ffc5ef61a1df2b8bb5a72f131 Author: Max Gekk AuthorDate: Fri May 13 12:47:53 2022 +0300 [SPARK-39165][SQL][3.3] Replace `sys.error` by `IllegalStateException` ### What changes were proposed in this pull request? Replace all invokes of `sys.error()` by throwing of `IllegalStateException` in the `sql` namespace. This is a backport of https://github.com/apache/spark/pull/36524. ### Why are the changes needed? In the context of wrapping all internal errors like asserts/illegal state exceptions (see https://github.com/apache/spark/pull/36500), it is impossible to distinguish `RuntimeException` of `sys.error()` from Spark's exceptions like `SparkRuntimeException`. The last one can be propagated to the user space but `sys.error` exceptions shouldn't be visible to users in regular cases. ### Does this PR introduce _any_ user-facing change? No, shouldn't. sys.error shouldn't propagate exception to user space in regular cases. ### How was this patch tested? By running the existing test suites. Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 95c7efd7571464d8adfb76fb22e47a5816cf73fb) Signed-off-by: Max Gekk Closes #36532 from MaxGekk/sys_error-internal-3.3. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/execution/SparkStrategies.scala| 4 ++-- .../org/apache/spark/sql/execution/datasources/DataSource.scala | 8 .../sql/execution/datasources/parquet/ParquetWriteSupport.scala | 3 +-- .../apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala | 4 ++-- .../org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala | 5 +++-- .../scala/org/apache/spark/sql/execution/streaming/memory.scala | 3 ++- .../execution/streaming/sources/TextSocketMicroBatchStream.scala | 3 ++- .../src/main/scala/org/apache/spark/sql/execution/subquery.scala | 3 ++- .../apache/spark/sql/execution/window/AggregateProcessor.scala| 2 +- .../org/apache/spark/sql/execution/window/WindowExecBase.scala| 8 .../src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala | 3 ++- .../scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala | 2 +- 12 files changed, 26 insertions(+), 22 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala index 3b8a70ffe94..17f3cfbda89 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala @@ -503,8 +503,8 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { _.aggregateFunction.children.filterNot(_.foldable).toSet).distinct.length > 1) { // This is a sanity check. We should not reach here when we have multiple distinct // column sets. Our `RewriteDistinctAggregates` should take care this case. - sys.error("You hit a query analyzer bug. Please report your query to " + - "Spark user mailing list.") + throw new IllegalStateException( +"You hit a query analyzer bug. Please report your query to Spark user mailing list.") } // Ideally this should be done in `NormalizeFloatingNumbers`, but we do it here because diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala index 2bb3d48c145..143fb4cf960 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala @@ -539,8 +539,8 @@ case class DataSource( DataWritingCommand.propogateMetrics(sparkSession.sparkContext, resolved, metrics) // Replace the schema with that of the DataFrame we just wrote out to avoid re-inferring copy(userSpecifiedSchema = Some(outputColumns.toStructType.asNullable)).resolveRelation() - case _ => -sys.error(s"${providingClass.getCanonicalName} does not allow create table as select.") + case _ => throw new IllegalStateException( +s"${providingClass.getCanonicalName} does not allow create table as select.") } } @@ -556,8 +556,8 @@ case class DataSource( disallowWritingIntervals(data.schema.map(_.dataType), forbidAnsiIntervals = false)
[spark] branch master updated (ab1bceff228 -> cdd33e83c39)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from ab1bceff228 [SPARK-39159][SQL] Add new Dataset API for Offset add cdd33e83c39 [SPARK-39175][SQL] Provide runtime error query context for Cast when WSCG is off No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/Cast.scala | 67 -- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 27 - 2 files changed, 64 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-39175][SQL] Provide runtime error query context for Cast when WSCG is off
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 27c03e5741a [SPARK-39175][SQL] Provide runtime error query context for Cast when WSCG is off 27c03e5741a is described below commit 27c03e5741af25b7afacac727865e23f60ce61fa Author: Gengliang Wang AuthorDate: Fri May 13 17:46:33 2022 +0800 [SPARK-39175][SQL] Provide runtime error query context for Cast when WSCG is off ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides runtime error query context for the Cast expression when WSCG is off. ### Why are the changes needed? Enhance the runtime error query context of Cast expression. After changes, it works when the whole stage codegen is not available. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #36535 from gengliangwang/fixCastContext. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit cdd33e83c3919c4475e2e1ef387acb604bea81b9) Signed-off-by: Gengliang Wang --- .../spark/sql/catalyst/expressions/Cast.scala | 67 -- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 27 - 2 files changed, 64 insertions(+), 30 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala index 335a34514c2..17d571a70f2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala @@ -277,7 +277,10 @@ object Cast { } } -abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression with NullIntolerant { +abstract class CastBase extends UnaryExpression +with TimeZoneAwareExpression +with NullIntolerant +with SupportQueryContext { def child: Expression @@ -307,6 +310,12 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit protected def ansiEnabled: Boolean + override def initQueryContext(): String = if (ansiEnabled) { +origin.context + } else { +"" + } + // When this cast involves TimeZone, it's only resolved if the timeZoneId is set; // Otherwise behave like Expression.resolved. override lazy val resolved: Boolean = @@ -467,7 +476,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit false } else { if (ansiEnabled) { -throw QueryExecutionErrors.invalidInputSyntaxForBooleanError(s, origin.context) +throw QueryExecutionErrors.invalidInputSyntaxForBooleanError(s, queryContext) } else { null } @@ -499,7 +508,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit case StringType => buildCast[UTF8String](_, utfs => { if (ansiEnabled) { - DateTimeUtils.stringToTimestampAnsi(utfs, zoneId, origin.context) + DateTimeUtils.stringToTimestampAnsi(utfs, zoneId, queryContext) } else { DateTimeUtils.stringToTimestamp(utfs, zoneId).orNull } @@ -524,14 +533,14 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit // TimestampWritable.doubleToTimestamp case DoubleType => if (ansiEnabled) { -buildCast[Double](_, d => doubleToTimestampAnsi(d, origin.context)) +buildCast[Double](_, d => doubleToTimestampAnsi(d, queryContext)) } else { buildCast[Double](_, d => doubleToTimestamp(d)) } // TimestampWritable.floatToTimestamp case FloatType => if (ansiEnabled) { -buildCast[Float](_, f => doubleToTimestampAnsi(f.toDouble, origin.context)) +buildCast[Float](_, f => doubleToTimestampAnsi(f.toDouble, queryContext)) } else { buildCast[Float](_, f => doubleToTimestamp(f.toDouble)) } @@ -541,7 +550,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit case StringType => buildCast[UTF8String](_, utfs => { if (ansiEnabled) { - DateTimeUtils.stringToTimestampWithoutTimeZoneAnsi(utfs, origin.context) + DateTimeUtils.stringToTimestampWithoutTimeZoneAnsi(utfs, queryContext) } else { DateTimeUtils.stringToTimestampWithoutTimeZone(utfs).orNull } @@ -574,7 +583,7 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit private[this] def castToDate(from: DataType): Any => Any = from match { case StringType => if
[spark] branch master updated (e336567c8a9 -> ab1bceff228)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e336567c8a9 [SPARK-39166][SQL] Provide runtime error query context for binary arithmetic when WSCG is off add ab1bceff228 [SPARK-39159][SQL] Add new Dataset API for Offset No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/CheckAnalysis.scala | 7 --- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 7 --- .../main/scala/org/apache/spark/sql/Dataset.scala | 10 + .../org/apache/spark/sql/DataFrameSuite.scala | 24 ++ 4 files changed, 34 insertions(+), 14 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org