[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46810:
-----------------------------------

    Assignee: Nicholas Chammas

> Clarify error class terminology
> -------------------------------
>
>                 Key: SPARK-46810
>                 URL: https://issues.apache.org/jira/browse/SPARK-46810
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation, SQL
>    Affects Versions: 4.0.0
>            Reporter: Nicholas Chammas
>            Assignee: Nicholas Chammas
>            Priority: Minor
>              Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>  **** ARRAY
>  **** MAP
>  **** STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It aligns most closely to the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7cbbbba3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "category" is low impact and 
> may not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms do not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> h1. Option 3: "Error Class" and "State Class"
>  * SQL state class: 42
>  * SQL state sub-class: K01
>  * SQL state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a "class" to a "state class" is low impact 
> and may not show up in user-facing documentation at all. (See my side note 
> below.)
> Cons:
>  * "State class" vs. "Error class" is a bit confusing.
>  * These terms do not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In any case, I believe talking about "42" and "K01" – regardless 
> of what we end up calling them – in front of users is not helpful. I don't 
> think anybody cares what "42" by itself means, or what "K01" by itself means. 
> Accordingly, we should limit how much we talk about these concepts in the 
> user-facing documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to