[ 
https://issues.apache.org/jira/browse/SPARK-36094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Feng updated SPARK-36094:
-------------------------------
    Description: 
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first.
As a starting point, we can build off the exception grouping done in 
[SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
there are ~1000 error messages to group split across three files 
(QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). If you 
work on this ticket, please create a subtask to improve ease of reviewing.

As a guideline, the error classes should be de-duplicated as much as possible 
to improve auditing.
We will improve error message quality as a follow-up.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].

  was:
To improve auditing, reduce duplication, and improve quality of error messages 
thrown from Spark, we should group them in a single JSON file (as discussed in 
the [mailing 
list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
 and introduced in 
[SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
 In this file, the error messages should be labeled according to a consistent 
error class and with a SQLSTATE.

We will start with the SQL component first, building off the exception grouping 
done in [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In 
total, there are ~1000 error messages to group split across three files 
(QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). As a 
result, the work on this has been broken up into three subtasks, each of which 
involve grouping error messages across one of these files. 
This work should be done across multiple PRs per subtask to improve ease of 
reviewing. For each subtask, comment to place a lock and minimize merge 
conflicts down the line.

As a guideline, the error classes should be de-duplicated as much as possible 
to improve auditing.
We will improve error message quality as a follow-up.

Here is an example PR that groups a few error messages in the 
QueryCompilationErrors class: [PR 
33309|https://github.com/apache/spark/pull/33309].


> Group SQL component error messages in Spark error class JSON file
> -----------------------------------------------------------------
>
>                 Key: SPARK-36094
>                 URL: https://issues.apache.org/jira/browse/SPARK-36094
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>    Affects Versions: 3.2.0
>            Reporter: Karen Feng
>            Priority: Major
>
> To improve auditing, reduce duplication, and improve quality of error 
> messages thrown from Spark, we should group them in a single JSON file (as 
> discussed in the [mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Add-error-IDs-td31126.html]
>  and introduced in 
> [SPARK-34920|#diff-d41e24da75af19647fadd76ad0b63ecb22b08c0004b07091e4603a30ec0fe013]).
>  In this file, the error messages should be labeled according to a consistent 
> error class and with a SQLSTATE.
> We will start with the SQL component first.
> As a starting point, we can build off the exception grouping done in 
> [SPARK-33539|https://issues.apache.org/jira/browse/SPARK-33539]. In total, 
> there are ~1000 error messages to group split across three files 
> (QueryCompilationErrors, QueryExecutionErrors, and QueryParsingErrors). If 
> you work on this ticket, please create a subtask to improve ease of reviewing.
> As a guideline, the error classes should be de-duplicated as much as possible 
> to improve auditing.
> We will improve error message quality as a follow-up.
> Here is an example PR that groups a few error messages in the 
> QueryCompilationErrors class: [PR 
> 33309|https://github.com/apache/spark/pull/33309].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to