I'll start off the vote with a strong +1 (binding).

On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com>
wrote:

> I propose to add the following text to Spark's Semantic Versioning policy
> <https://spark.apache.org/versioning-policy.html> and adopt it as the
> rubric that should be used when deciding to break APIs (even at major
> versions such as 3.0).
>
>
> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
> procedural
> vote <https://www.apache.org/foundation/voting.html>, the measure will
> pass if there are more favourable votes than unfavourable ones. PMC votes
> are binding, but the community is encouraged to add their voice to the
> discussion.
>
>
> [ ] +1 - Spark should adopt this policy.
>
> [ ] -1  - Spark should not adopt this policy.
>
>
> <new policy>
>
>
> Considerations When Breaking APIs
>
> The Spark project strives to avoid breaking APIs or silently changing
> behavior, even at major versions. While this is not always possible, the
> balance of the following factors should be considered before choosing to
> break an API.
>
> Cost of Breaking an API
>
> Breaking an API almost always has a non-trivial cost to the users of
> Spark. A broken API means that Spark programs need to be rewritten before
> they can be upgraded. However, there are a few considerations when thinking
> about what the cost will be:
>
>    -
>
>    Usage - an API that is actively used in many different places, is
>    always very costly to break. While it is hard to know usage for sure, there
>    are a bunch of ways that we can estimate:
>    -
>
>       How long has the API been in Spark?
>       -
>
>       Is the API common even for basic programs?
>       -
>
>       How often do we see recent questions in JIRA or mailing lists?
>       -
>
>       How often does it appear in StackOverflow or blogs?
>       -
>
>    Behavior after the break - How will a program that works today, work
>    after the break? The following are listed roughly in order of increasing
>    severity:
>    -
>
>       Will there be a compiler or linker error?
>       -
>
>       Will there be a runtime exception?
>       -
>
>       Will that exception happen after significant processing has been
>       done?
>       -
>
>       Will we silently return different answers? (very hard to debug,
>       might not even notice!)
>
>
> Cost of Maintaining an API
>
> Of course, the above does not mean that we will never break any APIs. We
> must also consider the cost both to the project and to our users of keeping
> the API in question.
>
>    -
>
>    Project Costs - Every API we have needs to be tested and needs to keep
>    working as other parts of the project changes. These costs are
>    significantly exacerbated when external dependencies change (the JVM,
>    Scala, etc). In some cases, while not completely technically infeasible,
>    the cost of maintaining a particular API can become too high.
>    -
>
>    User Costs - APIs also have a cognitive cost to users learning Spark
>    or trying to understand Spark programs. This cost becomes even higher when
>    the API in question has confusing or undefined semantics.
>
>
> Alternatives to Breaking an API
>
> In cases where there is a "Bad API", but where the cost of removal is also
> high, there are alternatives that should be considered that do not hurt
> existing users but do address some of the maintenance costs.
>
>
>    -
>
>    Avoid Bad APIs - While this is a bit obvious, it is an important
>    point. Anytime we are adding a new interface to Spark we should consider
>    that we might be stuck with this API forever. Think deeply about how
>    new APIs relate to existing ones, as well as how you expect them to evolve
>    over time.
>    -
>
>    Deprecation Warnings - All deprecation warnings should point to a
>    clear alternative and should never just say that an API is deprecated.
>    -
>
>    Updated Docs - Documentation should point to the "best" recommended
>    way of performing a given task. In the cases where we maintain legacy
>    documentation, we should clearly point to newer APIs and suggest to users
>    the "right" way.
>    -
>
>    Community Work - Many people learn Spark by reading blogs and other
>    sites such as StackOverflow. However, many of these resources are out of
>    date. Update them, to reduce the cost of eventually removing deprecated
>    APIs.
>
>
> </new policy>
>

Reply via email to