Good point, Santosh!

I was originally targeting end users who write queries with Spark, as this
is probably the largest user base. But we should definitely consider other
users who deploy and manage Spark clusters. Those users are usually more
tolerant of behavior changes and I think it should be sufficient to put
behavior changes in this area in the release notes.

On Wed, May 1, 2024 at 11:18 PM Santosh Pingale <santosh.ping...@adyen.com>
wrote:

> Thanks Wenchen for starting this!
>
> How do we define "the user" for spark?
> 1. End users: There are some users that use spark as a service from a
> provider
> 2. Providers/Operators: There are some users that provide spark as a
> service for their internal(on-prem setup with yarn/k8s)/external(Something
> like EMR) customers
> 3. ?
>
> Perhaps we need to consider infrastructure behavior changes as well to
> accommodate the second group of users.
>
> On 1 May 2024, at 06:08, Wenchen Fan <cloud0...@gmail.com> wrote:
>
> Hi all,
>
> It's exciting to see innovations keep happening in the Spark community and
> Spark keeps evolving itself. To make these innovations available to more
> users, it's important to help users upgrade to newer Spark versions easily.
> We've done a good job on it: the PR template requires the author to write
> down user-facing behavior changes, and the migration guide contains
> behavior changes that need attention from users. Sometimes behavior changes
> come with a legacy config to restore the old behavior. However, we still
> lack a clear definition of behavior changes and I propose the following
> definition:
>
> Behavior changes mean user-visible functional changes in a new release via
> public APIs. This means new features, and even bug fixes that eliminate NPE
> or correct query results, are behavior changes. Things like performance
> improvement, code refactoring, and changes to unreleased APIs/features are
> not. All behavior changes should be called out in the PR description. We
> need to write an item in the migration guide (and probably legacy config)
> for those that may break users when upgrading:
>
>    - Bug fixes that change query results. Users may need to do backfill
>    to correct the existing data and must know about these correctness fixes.
>    - Bug fixes that change query schema. Users may need to update the
>    schema of the tables in their data pipelines and must know about these
>    changes.
>    - Remove configs
>    - Rename error class/condition
>    - Any change to the public Python/SQL/Scala/Java/R APIs: rename
>    function, remove parameters, add parameters, rename parameters, change
>    parameter default values, etc. These changes should be avoided in general,
>    or do it in a compatible way like deprecating and adding a new function
>    instead of renaming.
>
> Once we reach a conclusion, I'll document it in
> https://spark.apache.org/versioning-policy.html .
>
> Thanks,
> Wenchen
>
>
>

Reply via email to