[ 
https://issues.apache.org/jira/browse/SPARK-53507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-53507.
----------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

Issue resolved by pull request 52256
[https://github.com/apache/spark/pull/52256]

> Add Breaking Change info to Spark error classes
> -----------------------------------------------
>
>                 Key: SPARK-53507
>                 URL: https://issues.apache.org/jira/browse/SPARK-53507
>             Project: Spark
>          Issue Type: Task
>          Components: Spark Core
>    Affects Versions: 4.1.0
>            Reporter: Ian Markowitz
>            Assignee: Ian Markowitz
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Users of Apache Spark often have their jobs break when upgrading to a new 
> version. We'd like to improve this using config flags and a concept called 
> "Breaking Change Info".
> This is an example of a breaking change:
> {quote}Since Spark 4.1, `mapInPandas` and `mapInArrow` enforces strict 
> validation of the result against the schema. The column names must exactly 
> match and types must match with compatible nullability. To restore the 
> previous behavior, set 
> `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`.
> {quote}
>  
> This can be mitigated as follows:
>  * When the breaking change is created, we define an error class with a 
> `breakingChangeInfo` object. This includes a message, a spark config, and a 
> flag indicating if the mitigation could be applied automatically.
> Example:
> {code:java}
> "MAP_VALIDATION_ERROR": {
>   "message": [
>     "Result validation failed: The schema does not match the expected 
> schema.",
>   ],
>   "breakingChangeInfo": {
>     "migrationMessage": [
>       "To disable strict result validation, set set 
> `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`"
>     ],
>     "mitigationSparkConfig": {
>       "key": "spark.sql.execution.arrow.pyspark.validateSchema.enabled",
>       "value": "false"
>     },
>     "autoMitigation": true
>   }
> }
> {code}
>  * In the Spark code, when this particular breaking change is hit, we always 
> throw an error with the matching error class.
>  * A platform running the spark job can handle this error by re-running this 
> job with the specified config applied. This enables automatic, successfully 
> retries of the job with the breaking change mitigated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to