[
https://issues.apache.org/jira/browse/SPARK-53507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-53507.
----------------------------------
Fix Version/s: 4.1.0
Resolution: Fixed
Issue resolved by pull request 52256
[https://github.com/apache/spark/pull/52256]
> Add Breaking Change info to Spark error classes
> -----------------------------------------------
>
> Key: SPARK-53507
> URL: https://issues.apache.org/jira/browse/SPARK-53507
> Project: Spark
> Issue Type: Task
> Components: Spark Core
> Affects Versions: 4.1.0
> Reporter: Ian Markowitz
> Assignee: Ian Markowitz
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Users of Apache Spark often have their jobs break when upgrading to a new
> version. We'd like to improve this using config flags and a concept called
> "Breaking Change Info".
> This is an example of a breaking change:
> {quote}Since Spark 4.1, `mapInPandas` and `mapInArrow` enforces strict
> validation of the result against the schema. The column names must exactly
> match and types must match with compatible nullability. To restore the
> previous behavior, set
> `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`.
> {quote}
>
> This can be mitigated as follows:
> * When the breaking change is created, we define an error class with a
> `breakingChangeInfo` object. This includes a message, a spark config, and a
> flag indicating if the mitigation could be applied automatically.
> Example:
> {code:java}
> "MAP_VALIDATION_ERROR": {
> "message": [
> "Result validation failed: The schema does not match the expected
> schema.",
> ],
> "breakingChangeInfo": {
> "migrationMessage": [
> "To disable strict result validation, set set
> `spark.sql.execution.arrow.pyspark.validateSchema.enabled` to `false`"
> ],
> "mitigationSparkConfig": {
> "key": "spark.sql.execution.arrow.pyspark.validateSchema.enabled",
> "value": "false"
> },
> "autoMitigation": true
> }
> }
> {code}
> * In the Spark code, when this particular breaking change is hit, we always
> throw an error with the matching error class.
> * A platform running the spark job can handle this error by re-running this
> job with the specified config applied. This enables automatic, successfully
> retries of the job with the breaking change mitigated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]