This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 7e82e1bc43e [SPARK-45581] Make SQLSTATE mandatory 7e82e1bc43e is described below commit 7e82e1bc43e0297c3036d802b3a151d2b93db2f6 Author: srielau <se...@rielau.com> AuthorDate: Wed Oct 18 11:04:44 2023 -0700 [SPARK-45581] Make SQLSTATE mandatory ### What changes were proposed in this pull request? We propose to make SQLSTATEs mandatory field when using error classes in the new error framework. ### Why are the changes needed? Being able to rely on the existence of SQLSTATEs allows easier classification of errors as well as usage of tooling to intercept SQLSTATEs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? A new test was added to SparkThrowableSuite to enforce SQLSTATE existence ### Was this patch authored or co-authored using generative AI tooling? No Closes #43412 from srielau/SPARK-45581-make-SQLSTATEs-mandatory. Authored-by: srielau <se...@rielau.com> Signed-off-by: Gengliang Wang <gengli...@apache.org> --- common/utils/src/main/resources/error/README.md | 26 ++++++++++++++-------- .../src/main/resources/error/error-classes.json | 9 +++++--- .../org/apache/spark/ErrorClassesJSONReader.scala | 8 +++---- .../org/apache/spark/SparkThrowableSuite.scala | 16 ++++++++----- docs/sql-error-conditions.md | 6 ++--- 5 files changed, 41 insertions(+), 24 deletions(-) diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md index ac388c29250..8d8529bea56 100644 --- a/common/utils/src/main/resources/error/README.md +++ b/common/utils/src/main/resources/error/README.md @@ -1,6 +1,6 @@ # Guidelines -To throw a standardized user-facing error or exception, developers should specify the error class +To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE, and message parameters rather than an arbitrary error message. ## Usage @@ -10,7 +10,7 @@ and message parameters rather than an arbitrary error message. If true, use the error class `INTERNAL_ERROR` and skip to step 4. 2. Check if an appropriate error class already exists in `error-classes.json`. If true, use the error class and skip to step 4. -3. Add a new class to `error-classes.json`; keep in mind the invariants below. +3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below. 4. Check if the exception type already extends `SparkThrowable`. If true, skip to step 6. 5. Mix `SparkThrowable` into the exception. @@ -26,9 +26,9 @@ Throw with arbitrary error message: `error-classes.json` - "PROBLEM_BECAUSE": { - "message": ["Problem <problem> because <cause>"], - "sqlState": "XXXXX" + "PROBLEM_BECAUSE" : { + "message" : ["Problem <problem> because <cause>"], + "sqlState" : "XXXXX" } `SparkException.scala` @@ -70,6 +70,8 @@ Error classes are a succinct, human-readable representation of the error categor An uncategorized errors can be assigned to a legacy error class with the prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance `_LEGACY_ERROR_TEMP_0053`. +You should not introduce new uncategorized errors. Instead, convert them to proper errors whenever encountering them in new code. + #### Invariants - Unique @@ -79,7 +81,10 @@ An uncategorized errors can be assigned to a legacy error class with the prefix ### Message Error messages provide a descriptive, human-readable representation of the error. -The message format accepts string parameters via the C-style printf syntax. +The message format accepts string parameters via the HTML tag syntax: e.g. <relationName>. + +The values passed to the message shoudl not themselves be messages. +They should be: runtime-values, keywords, identifiers, or other values that are not translated. The quality of the error message should match the [guidelines](https://spark.apache.org/error-message-guidelines.html). @@ -90,21 +95,24 @@ The quality of the error message should match the ### SQLSTATE -SQLSTATE is an optional portable error identifier across SQL engines. +SQLSTATE is an mandatory portable error identifier across SQL engines. SQLSTATE comprises a 2-character class value followed by a 3-character subclass value. Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors. For extension Spark claims the 'K**' subclass range. If a new class is needed it will also claim the 'K0' class. +Internal errors should use the 'XX' class. You can subdivide internal errors by component. +For example: The existing 'XXKD0' is used for an internal analyzer error. + #### Invariants -- Consistent across releases +- Consistent across releases unless the error is internal. #### ANSI/ISO standard The following SQLSTATEs are collated from: - SQL2016 -- DB2 zOS +- DB2 zOS/LUW - PostgreSQL 15 - Oracle 12 (last published) - SQL Server diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 7350d0331c4..9e442f4c666 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -2048,7 +2048,8 @@ "INVALID_SQL_ARG" : { "message" : [ "The argument <name> of `sql()` is invalid. Consider to replace it either by a SQL literal or by collection constructor functions such as `map()`, `array()`, `struct()`." - ] + ], + "sqlState" : "42K08" }, "INVALID_SQL_SYNTAX" : { "message" : [ @@ -2370,7 +2371,8 @@ "MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION" : { "message" : [ "The expression <expr> does not support more than one source." - ] + ], + "sqlState" : "42K0E" }, "MULTI_UDF_INTERFACE_ERROR" : { "message" : [ @@ -3689,7 +3691,8 @@ "1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.", "2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.", "3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use this API with caution." - ] + ], + "sqlState" : "42K0E" }, "USER_RAISED_EXCEPTION" : { "message" : [ diff --git a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala index c59e1e376ea..083064bfe23 100644 --- a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala +++ b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala @@ -97,9 +97,9 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) { val errorClasses = errorClass.split("\\.") errorClasses match { case Array(mainClass) => errorInfoMap.contains(mainClass) - case Array(mainClass, subClass) => errorInfoMap.get(mainClass).map { info => + case Array(mainClass, subClass) => errorInfoMap.get(mainClass).exists { info => info.subClass.get.contains(subClass) - }.getOrElse(false) + } case _ => false } } @@ -130,7 +130,7 @@ private object ErrorClassesJsonReader { * * @param sqlState SQLSTATE associated with this class. * @param subClass SubClass associated with this class. - * @param message C-style message format compatible with printf. + * @param message Message format with optional placeholders (e.g. <parm>). * The error message is constructed by concatenating the lines with newlines. */ private case class ErrorInfo( @@ -145,7 +145,7 @@ private case class ErrorInfo( /** * Information associated with an error subclass. * - * @param message C-style message format compatible with printf. + * @param message Message format with optional placeholders (e.g. <parm>). * The error message is constructed by concatenating the lines with newlines. */ private case class ErrorSubInfo(message: Seq[String]) { diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala index bb9fe79071c..a4120637b69 100644 --- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala @@ -66,15 +66,12 @@ class SparkThrowableSuite extends SparkFunSuite { def checkIfUnique(ss: Seq[Any]): Unit = { val dups = ss.groupBy(identity).mapValues(_.size).filter(_._2 > 1).keys.toSeq - assert(dups.isEmpty) + assert(dups.isEmpty, s"Duplicate error classes: ${dups.mkString(", ")}") } def checkCondition(ss: Seq[String], fx: String => Boolean): Unit = { ss.foreach { s => - if (!fx(s)) { - print(s) - } - assert(fx(s)) + assert(fx(s), s) } } @@ -116,6 +113,15 @@ class SparkThrowableSuite extends SparkFunSuite { } } + test("SQLSTATE is mandatory") { + val errorClassesNoSqlState = errorReader.errorInfoMap.filter { + case (error: String, info: ErrorInfo) => + !error.startsWith("_LEGACY_ERROR_TEMP") && info.sqlState.isEmpty + }.keys.toSeq + assert(errorClassesNoSqlState.isEmpty, + s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(", ")}") + } + test("SQLSTATE invariants") { val sqlStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState) val errorClassReadMe = Utils.getSparkClassLoader.getResource("error/README.md") diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md index 78904780054..d8f5193c9bc 100644 --- a/docs/sql-error-conditions.md +++ b/docs/sql-error-conditions.md @@ -1178,7 +1178,7 @@ Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include ### INVALID_SQL_ARG -SQLSTATE: none assigned +[SQLSTATE: 42K08](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) The argument `<name>` of `sql()` is invalid. Consider to replace it either by a SQL literal or by collection constructor functions such as `map()`, `array()`, `struct()`. @@ -1344,7 +1344,7 @@ The query does not include a GROUP BY clause. Add GROUP BY or turn it into the w ### MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION -SQLSTATE: none assigned +[SQLSTATE: 42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) The expression `<expr>` does not support more than one source. @@ -2191,7 +2191,7 @@ Literals of the type `<unsupportedType>` are not supported. Supported types are ### UNTYPED_SCALA_UDF -SQLSTATE: none assigned +[SQLSTATE: 42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation) You're using untyped Scala UDF, which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could: 1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org