[jira] [Commented] (SPARK-46810) Clarify error class terminology

Serge Rielau (Jira) Sat, 27 Jan 2024 16:01:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811606#comment-17811606
 ]


Serge Rielau commented on SPARK-46810:
--------------------------------------

[~nchammas] 
ISO/IEC 9075-2:2016(E) 24.1 SQLSTATE
The character string value returned in an SQLSTATE parameter comprises a 
2-character class code followed by a 3-character subclass code, each with an 
implementation-defined character set that has a one-octet character encoding 
form and is restricted to <digit>s and <simple Latin upper case letter>s. Table 
38, “SQLSTATE class and subclass codes”, specifies the class code for each 
condition and the subclass code or codes for each class code.

Class codes that begin with one of the <digit>s '0', '1', '2', '3', or '4' or 
one of the <simple Latin upper case letter>s 'A', 'B', 'C', 'D', 'E', 'F', 'G', 
or 'H' are returned only for conditions defined in ISO/IEC 9075 or in any other 
International Standard. The range of such class codes is called 
standard-defined classes. Some such class codes are reserved for use by 
specific International Standards, as specified elsewhere in this Clause. 
Subclass codes associated with such classes that also begin with one of those 
13 characters are returned only for conditions defined in ISO/IEC 9075 or some 
other International Standard. The range of such subclass codes is called 
standard-defined subclasses. Subclass codes associated with such classes that 
begin with one of the <digit>s '5', '6', '7', '8', or '9' or one of the <simple 
Latin upper case letter>s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 
'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for 
implementation-defined conditions and are called implementation- defined 
subclasses.

Class codes that begin with one of the <digit>s '5', '6', '7', '8', or '9' or 
one of the <simple Latin upper case letter>s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for 
implementation-defined exception conditions and are called 
implementation-defined classes. All subclass codes except '000', which means no 
subclass, associated with such classes are reserved for implementation-defined 
conditions and are called implementation-defined subclasses. An 
implementation-defined completion condition shall be indicated by returning an 
implementation-defined subclass in conjunction with one of the classes 
successful completion, warning, or no data.

I'm fine with the renaming of error class to error condition and subcondition.

> Clarify error class terminology
> -------------------------------
>
>                 Key: SPARK-46810
>                 URL: https://issues.apache.org/jira/browse/SPARK-46810
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, SQL
>    Affects Versions: 4.0.0
>            Reporter: Nicholas Chammas
>            Priority: Minor
>              Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>  **** ARRAY
>  **** MAP
>  **** STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It may also match the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7cbbbba3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a class to a category is low impact and may 
> not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms may not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In either case, I believe talking about "42" and "K01" – 
> regardless of what we end up calling them – in front of users is not helpful. 
> I don't think anybody cares what "42" by itself means, or what "K01" by 
> itself means. Accordingly, we should limit how much we talk about these 
> concepts in the user-facing documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-46810) Clarify error class terminology

Reply via email to