Hi all, It's exciting to see innovations keep happening in the Spark community and Spark keeps evolving itself. To make these innovations available to more users, it's important to help users upgrade to newer Spark versions easily. We've done a good job on it: the PR template requires the author to write down user-facing behavior changes, and the migration guide contains behavior changes that need attention from users. Sometimes behavior changes come with a legacy config to restore the old behavior. However, we still lack a clear definition of behavior changes and I propose the following definition:
Behavior changes mean user-visible functional changes in a new release via public APIs. This means new features, and even bug fixes that eliminate NPE or correct query results, are behavior changes. Things like performance improvement, code refactoring, and changes to unreleased APIs/features are not. All behavior changes should be called out in the PR description. We need to write an item in the migration guide (and probably legacy config) for those that may break users when upgrading: - Bug fixes that change query results. Users may need to do backfill to correct the existing data and must know about these correctness fixes. - Bug fixes that change query schema. Users may need to update the schema of the tables in their data pipelines and must know about these changes. - Remove configs - Rename error class/condition - Any change to the public Python/SQL/Scala/Java/R APIs: rename function, remove parameters, add parameters, rename parameters, change parameter default values, etc. These changes should be avoided in general, or do it in a compatible way like deprecating and adding a new function instead of renaming. Once we reach a conclusion, I'll document it in https://spark.apache.org/versioning-policy.html . Thanks, Wenchen