[spark] branch master updated: [SPARK-45581] Make SQLSTATE mandatory

gengliang Wed, 18 Oct 2023 11:05:06 -0700

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 7e82e1bc43e [SPARK-45581] Make SQLSTATE mandatory
7e82e1bc43e is described below

commit 7e82e1bc43e0297c3036d802b3a151d2b93db2f6
Author: srielau <se...@rielau.com>
AuthorDate: Wed Oct 18 11:04:44 2023 -0700

    [SPARK-45581] Make SQLSTATE mandatory
    
    ### What changes were proposed in this pull request?
    
    We propose to make SQLSTATEs mandatory field when using error classes in 
the new error framework.
    
    ### Why are the changes needed?
    
    Being able to rely on the existence of SQLSTATEs allows easier 
classification of errors as well as usage of tooling to intercept SQLSTATEs.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    A new test was added to SparkThrowableSuite to enforce SQLSTATE existence
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #43412 from srielau/SPARK-45581-make-SQLSTATEs-mandatory.
    
    Authored-by: srielau <se...@rielau.com>
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
---
 common/utils/src/main/resources/error/README.md    | 26 ++++++++++++++--------
 .../src/main/resources/error/error-classes.json    |  9 +++++---
 .../org/apache/spark/ErrorClassesJSONReader.scala  |  8 +++----
 .../org/apache/spark/SparkThrowableSuite.scala     | 16 ++++++++-----
 docs/sql-error-conditions.md                       |  6 ++---
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md 
b/common/utils/src/main/resources/error/README.md
index ac388c29250..8d8529bea56 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1,6 +1,6 @@
 # Guidelines
 
-To throw a standardized user-facing error or exception, developers should 
specify the error class
+To throw a standardized user-facing error or exception, developers should 
specify the error class, a SQLSTATE,
 and message parameters rather than an arbitrary error message.
 
 ## Usage
@@ -10,7 +10,7 @@ and message parameters rather than an arbitrary error message.
    If true, use the error class `INTERNAL_ERROR` and skip to step 4.
 2. Check if an appropriate error class already exists in `error-classes.json`.
    If true, use the error class and skip to step 4.
-3. Add a new class to `error-classes.json`; keep in mind the invariants below.
+3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; 
keep in mind the invariants below.
 4. Check if the exception type already extends `SparkThrowable`.
    If true, skip to step 6.
 5. Mix `SparkThrowable` into the exception.
@@ -26,9 +26,9 @@ Throw with arbitrary error message:
 
 `error-classes.json`
 
-    "PROBLEM_BECAUSE": {
-      "message": ["Problem <problem> because <cause>"],
-      "sqlState": "XXXXX"
+    "PROBLEM_BECAUSE" : {
+      "message" : ["Problem <problem> because <cause>"],
+      "sqlState" : "XXXXX"
     }
 
 `SparkException.scala`
@@ -70,6 +70,8 @@ Error classes are a succinct, human-readable representation 
of the error categor
 
 An uncategorized errors can be assigned to a legacy error class with the 
prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance 
`_LEGACY_ERROR_TEMP_0053`.
 
+You should not introduce new uncategorized errors. Instead, convert them to 
proper errors whenever encountering them in new code.
+
 #### Invariants
 
 - Unique
@@ -79,7 +81,10 @@ An uncategorized errors can be assigned to a legacy error 
class with the prefix
 ### Message
 
 Error messages provide a descriptive, human-readable representation of the 
error.
-The message format accepts string parameters via the C-style printf syntax.
+The message format accepts string parameters via the HTML tag syntax: e.g. 
<relationName>.
+
+The values passed to the message shoudl not themselves be messages.
+They should be: runtime-values, keywords, identifiers, or other values that 
are not translated.
 
 The quality of the error message should match the
 [guidelines](https://spark.apache.org/error-message-guidelines.html).
@@ -90,21 +95,24 @@ The quality of the error message should match the
 
 ### SQLSTATE
 
-SQLSTATE is an optional portable error identifier across SQL engines.
+SQLSTATE is an mandatory portable error identifier across SQL engines.
 SQLSTATE comprises a 2-character class value followed by a 3-character 
subclass value.
 Spark prefers to re-use existing SQLSTATEs, preferably used by multiple 
vendors.
 For extension Spark claims the 'K**' subclass range.
 If a new class is needed it will also claim the 'K0' class.
 
+Internal errors should use the 'XX' class. You can subdivide internal errors 
by component. 
+For example: The existing 'XXKD0' is used for an internal analyzer error.
+
 #### Invariants
 
-- Consistent across releases
+- Consistent across releases unless the error is internal.
 
 #### ANSI/ISO standard
 
 The following SQLSTATEs are collated from:
 - SQL2016
-- DB2 zOS
+- DB2 zOS/LUW
 - PostgreSQL 15
 - Oracle 12 (last published)
 - SQL Server
diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 7350d0331c4..9e442f4c666 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -2048,7 +2048,8 @@
   "INVALID_SQL_ARG" : {
     "message" : [
       "The argument <name> of `sql()` is invalid. Consider to replace it 
either by a SQL literal or by collection constructor functions such as `map()`, 
`array()`, `struct()`."
-    ]
+    ],
+    "sqlState" : "42K08"
   },
   "INVALID_SQL_SYNTAX" : {
     "message" : [
@@ -2370,7 +2371,8 @@
   "MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION" : {
     "message" : [
       "The expression <expr> does not support more than one source."
-    ]
+    ],
+    "sqlState" : "42K0E"
   },
   "MULTI_UDF_INTERFACE_ERROR" : {
     "message" : [
@@ -3689,7 +3691,8 @@
       "1. use typed Scala UDF APIs(without return type parameter), e.g. 
`udf((x: Int) => x)`.",
       "2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override 
def call(s: String): Integer = s.length() }, IntegerType)`, if input types are 
all non primitive.",
       "3. set \"spark.sql.legacy.allowUntypedScalaUDF\" to \"true\" and use 
this API with caution."
-    ]
+    ],
+    "sqlState" : "42K0E"
   },
   "USER_RAISED_EXCEPTION" : {
     "message" : [
diff --git 
a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala 
b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
index c59e1e376ea..083064bfe23 100644
--- a/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
+++ b/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala
@@ -97,9 +97,9 @@ class ErrorClassesJsonReader(jsonFileURLs: Seq[URL]) {
     val errorClasses = errorClass.split("\\.")
     errorClasses match {
       case Array(mainClass) => errorInfoMap.contains(mainClass)
-      case Array(mainClass, subClass) => errorInfoMap.get(mainClass).map { 
info =>
+      case Array(mainClass, subClass) => errorInfoMap.get(mainClass).exists { 
info =>
         info.subClass.get.contains(subClass)
-      }.getOrElse(false)
+      }
       case _ => false
     }
   }
@@ -130,7 +130,7 @@ private object ErrorClassesJsonReader {
  *
  * @param sqlState SQLSTATE associated with this class.
  * @param subClass SubClass associated with this class.
- * @param message C-style message format compatible with printf.
+ * @param message Message format with optional placeholders (e.g. 
&lt;parm&gt;).
  *                The error message is constructed by concatenating the lines 
with newlines.
  */
 private case class ErrorInfo(
@@ -145,7 +145,7 @@ private case class ErrorInfo(
 /**
  * Information associated with an error subclass.
  *
- * @param message C-style message format compatible with printf.
+ * @param message Message format with optional placeholders (e.g. 
&lt;parm&gt;).
  *                The error message is constructed by concatenating the lines 
with newlines.
  */
 private case class ErrorSubInfo(message: Seq[String]) {
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index bb9fe79071c..a4120637b69 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -66,15 +66,12 @@ class SparkThrowableSuite extends SparkFunSuite {
 
   def checkIfUnique(ss: Seq[Any]): Unit = {
     val dups = ss.groupBy(identity).mapValues(_.size).filter(_._2 > 
1).keys.toSeq
-    assert(dups.isEmpty)
+    assert(dups.isEmpty, s"Duplicate error classes: ${dups.mkString(", ")}")
   }
 
   def checkCondition(ss: Seq[String], fx: String => Boolean): Unit = {
     ss.foreach { s =>
-      if (!fx(s)) {
-        print(s)
-      }
-      assert(fx(s))
+      assert(fx(s), s)
     }
   }
 
@@ -116,6 +113,15 @@ class SparkThrowableSuite extends SparkFunSuite {
     }
   }
 
+  test("SQLSTATE is mandatory") {
+    val errorClassesNoSqlState = errorReader.errorInfoMap.filter {
+      case (error: String, info: ErrorInfo) =>
+        !error.startsWith("_LEGACY_ERROR_TEMP") && info.sqlState.isEmpty
+    }.keys.toSeq
+    assert(errorClassesNoSqlState.isEmpty,
+      s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(", 
")}")
+  }
+
   test("SQLSTATE invariants") {
     val sqlStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState)
     val errorClassReadMe = 
Utils.getSparkClassLoader.getResource("error/README.md")
diff --git a/docs/sql-error-conditions.md b/docs/sql-error-conditions.md
index 78904780054..d8f5193c9bc 100644
--- a/docs/sql-error-conditions.md
+++ b/docs/sql-error-conditions.md
@@ -1178,7 +1178,7 @@ Expected format is 'SET', 'SET key', or 'SET key=value'. 
If you want to include
 
 ### INVALID_SQL_ARG
 
-SQLSTATE: none assigned
+[SQLSTATE: 
42K08](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
 
 The argument `<name>` of `sql()` is invalid. Consider to replace it either by 
a SQL literal or by collection constructor functions such as `map()`, 
`array()`, `struct()`.
 
@@ -1344,7 +1344,7 @@ The query does not include a GROUP BY clause. Add GROUP 
BY or turn it into the w
 
 ### MULTI_SOURCES_UNSUPPORTED_FOR_EXPRESSION
 
-SQLSTATE: none assigned
+[SQLSTATE: 
42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
 
 The expression `<expr>` does not support more than one source.
 
@@ -2191,7 +2191,7 @@ Literals of the type `<unsupportedType>` are not 
supported. Supported types are
 
 ### UNTYPED_SCALA_UDF
 
-SQLSTATE: none assigned
+[SQLSTATE: 
42K0E](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
 
 You're using untyped Scala UDF, which does not have the input type 
information. Spark may blindly pass null to the Scala closure with 
primitive-type argument, and the closure will see the default value of the Java 
type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result 
is 0 for null input. To get rid of this error, you could:
 1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) 
=> x)`.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45581] Make SQLSTATE mandatory

Reply via email to