[ https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Chammas updated SPARK-46810: ------------------------------------- Description: We use inconsistent terminology when talking about error classes. I'd like to get some clarity on that before contributing any potential improvements to this part of the documentation. Consider [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. It has several key pieces of hierarchical information that have inconsistent names throughout our documentation and codebase: * 42 ** K01 *** INCOMPLETE_TYPE_DEFINITION **** ARRAY **** MAP **** STRUCT What are the names of these different levels of information? Some examples of inconsistent terminology: * [Over here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION we call that an "error class". So what exactly is a class, the 42 or the INCOMPLETE_TYPE_DEFINITION? * [Over here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] we call K01 the "subclass". But [over here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". So what exactly is a subclass? * [On this page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other places we refer to it as an "error class". I don't think we should leave this status quo as-is. I see a couple of ways to fix this. h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition" One solution is to use the following terms: * Error class: 42 * Error sub-class: K01 * Error state: 42K01 * Error condition: INCOMPLETE_TYPE_DEFINITION * Error sub-condition: ARRAY, MAP, STRUCT Pros: * This terminology seems (to me at least) the most natural and intuitive. * It may also match the SQL standard. Cons: * We use {{errorClass}} [all over our codebase|https://github.com/apache/spark/blob/15c9ec7cbbbba3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30] – literally in thousands of places – to refer to INCOMPLETE_TYPE_DEFINITION. ** It's probably not practical to update all these usages to say {{errorCondition}} instead, so if we go with this approach there will be a divide between the terminology we use in user-facing documentation vs. what the code base uses. ** We can perhaps rename the existing {{error-classes.json}} to {{error-conditions.json}} but clarify the reason for this divide in the documentation for {{ErrorClassesJsonReader}} . h1. Option 2: 42 becomes an "Error Category" Another * Error category: 42 * Error sub-category: K01 * Error state: 42K01 * Error class: INCOMPLETE_TYPE_DEFINITION * Error sub-classes: ARRAY, MAP, STRUCT We should not use "error condition" if one of the above terms more accurately describes what we are talking about. Side note: With this terminology, I believe talking about error categories and sub-categories in front of users is not helpful. I don't think anybody cares what "42" by itself means, or what "K01" by itself means. Accordingly, we should limit how much we talk about these concepts in the user-facing documentation. was: We use inconsistent terminology when talking about error classes. I'd like to get some clarity on that before contributing any potential improvements to this part of the documentation. Consider [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. It has several key pieces of hierarchical information that have inconsistent names throughout our documentation and codebase: * 42 ** K01 *** INCOMPLETE_TYPE_DEFINITION **** ARRAY **** MAP **** STRUCT What are the names of these different levels of information? Some examples of inconsistent terminology: * [Over here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION we call that an "error class". So what exactly is a class, the 42 or the INCOMPLETE_TYPE_DEFINITION? * [Over here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] we call K01 the "subclass". But [over here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". So what exactly is a subclass? * [On this page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other places we refer to it as an "error class". I personally like the terminology "error condition", but as we are already using "error class" very heavily throughout the codebase to refer to something like INCOMPLETE_TYPE_DEFINITION, I don't think it's practical to change at this point. To rationalize the different terms we are using, I propose the following terminology, which we should use consistently throughout our code and documentation: * Error category: 42 * Error sub-category: K01 * Error state: 42K01 * Error class: INCOMPLETE_TYPE_DEFINITION * Error sub-classes: ARRAY, MAP, STRUCT We should not use "error condition" if one of the above terms more accurately describes what we are talking about. Side note: With this terminology, I believe talking about error categories and sub-categories in front of users is not helpful. I don't think anybody cares what "42" by itself means, or what "K01" by itself means. Accordingly, we should limit how much we talk about these concepts in the user-facing documentation. > Clarify error class terminology > ------------------------------- > > Key: SPARK-46810 > URL: https://issues.apache.org/jira/browse/SPARK-46810 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL > Affects Versions: 4.0.0 > Reporter: Nicholas Chammas > Priority: Minor > Labels: pull-request-available > > We use inconsistent terminology when talking about error classes. I'd like to > get some clarity on that before contributing any potential improvements to > this part of the documentation. > Consider > [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. > It has several key pieces of hierarchical information that have inconsistent > names throughout our documentation and codebase: > * 42 > ** K01 > *** INCOMPLETE_TYPE_DEFINITION > **** ARRAY > **** MAP > **** STRUCT > What are the names of these different levels of information? > Some examples of inconsistent terminology: > * [Over > here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] > we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION > we call that an "error class". So what exactly is a class, the 42 or the > INCOMPLETE_TYPE_DEFINITION? > * [Over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] > we call K01 the "subclass". But [over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] > we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for > INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". > So what exactly is a subclass? > * [On this > page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] > we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other > places we refer to it as an "error class". > I don't think we should leave this status quo as-is. I see a couple of ways > to fix this. > h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition" > One solution is to use the following terms: > * Error class: 42 > * Error sub-class: K01 > * Error state: 42K01 > * Error condition: INCOMPLETE_TYPE_DEFINITION > * Error sub-condition: ARRAY, MAP, STRUCT > Pros: > * This terminology seems (to me at least) the most natural and intuitive. > * It may also match the SQL standard. > Cons: > * We use {{errorClass}} [all over our > codebase|https://github.com/apache/spark/blob/15c9ec7cbbbba3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30] > – literally in thousands of places – to refer to INCOMPLETE_TYPE_DEFINITION. > ** It's probably not practical to update all these usages to say > {{errorCondition}} instead, so if we go with this approach there will be a > divide between the terminology we use in user-facing documentation vs. what > the code base uses. > ** We can perhaps rename the existing {{error-classes.json}} to > {{error-conditions.json}} but clarify the reason for this divide in the > documentation for {{ErrorClassesJsonReader}} . > h1. Option 2: 42 becomes an "Error Category" > Another > * Error category: 42 > * Error sub-category: K01 > * Error state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > We should not use "error condition" if one of the above terms more accurately > describes what we are talking about. > Side note: With this terminology, I believe talking about error categories > and sub-categories in front of users is not helpful. I don't think anybody > cares what "42" by itself means, or what "K01" by itself means. Accordingly, > we should limit how much we talk about these concepts in the user-facing > documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org