Re: [VOTE] Amend Spark's Semantic Versioning Policy

Takeshi Yamamuro Mon, 09 Mar 2020 00:54:35 -0700

+1 (non-binding)

Bests,
Takeshi


On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <gengliang.w...@databricks.com>
wrote:

> +1 (non-binding)
>
> Gengliang
>
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com>
> wrote:
>
>> +1 as well.
>>
>> Matei
>>
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>> +1 (binding), assuming that this is for public stable APIs, not APIs that
>> are marked as unstable, evolving, etc.
>>
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Michael's section on the trade-offs of maintaining / removing an API are
>>> one of
>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>
>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>> wrote:
>>> >
>>> > This new policy has a good indention, but can we narrow down on the
>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>> >
>>> > I saw that there already exists a reverting PR to bring back Spark 1.4
>>> and 1.5 APIs based on this AS-IS suggestion.
>>> >
>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>> difficulty, and it's nice.
>>> >
>>> > However, for the other cases, it sounds like `recommending older APIs
>>> as much as possible` due to the following.
>>> >
>>> >      > How long has the API been in Spark?
>>> >
>>> > We had better be more careful when we add a new policy and should aim
>>> not to mislead the users and 3rd party library developers to say "older is
>>> better".
>>> >
>>> > Technically, I'm wondering who will use new APIs in their examples (of
>>> books and StackOverflow) if they need to write an additional warning like
>>> `this only works at 2.4.0+` always .
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com>
>>> wrote:
>>> >>
>>> >> I am in broad agreement with the prposal, as any developer, I prefer
>>> >> stable well designed API's :-)
>>> >>
>>> >> Can we tie the proposal to stability guarantees given by spark and
>>> >> reasonable expectation from users ?
>>> >> In my opinion, an unstable or evolving could change - while an
>>> >> experimental api which has been around for ages should be more
>>> >> conservatively handled.
>>> >> Which brings in question what are the stability guarantees as
>>> >> specified by annotations interacting with the proposal.
>>> >>
>>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>>> >> proposing to diverge from semver.
>>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>>> >> of API ? Stability guarantees ?
>>> >>
>>> >> Regards,
>>> >> Mridul
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >
>>> >> > I'll start off the vote with a strong +1 (binding).
>>> >> >
>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>> >> >>
>>> >> >> I propose to add the following text to Spark's Semantic Versioning
>>> policy and adopt it as the rubric that should be used when deciding to
>>> break APIs (even at major versions such as 3.0).
>>> >> >>
>>> >> >>
>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this
>>> is a procedural vote, the measure will pass if there are more favourable
>>> votes than unfavourable ones. PMC votes are binding, but the community is
>>> encouraged to add their voice to the discussion.
>>> >> >>
>>> >> >>
>>> >> >> [ ] +1 - Spark should adopt this policy.
>>> >> >>
>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>> >> >>
>>> >> >>
>>> >> >> <new policy>
>>> >> >>
>>> >> >>
>>> >> >> Considerations When Breaking APIs
>>> >> >>
>>> >> >> The Spark project strives to avoid breaking APIs or silently
>>> changing behavior, even at major versions. While this is not always
>>> possible, the balance of the following factors should be considered before
>>> choosing to break an API.
>>> >> >>
>>> >> >>
>>> >> >> Cost of Breaking an API
>>> >> >>
>>> >> >> Breaking an API almost always has a non-trivial cost to the users
>>> of Spark. A broken API means that Spark programs need to be rewritten
>>> before they can be upgraded. However, there are a few considerations when
>>> thinking about what the cost will be:
>>> >> >>
>>> >> >> Usage - an API that is actively used in many different places, is
>>> always very costly to break. While it is hard to know usage for sure, there
>>> are a bunch of ways that we can estimate:
>>> >> >>
>>> >> >> How long has the API been in Spark?
>>> >> >>
>>> >> >> Is the API common even for basic programs?
>>> >> >>
>>> >> >> How often do we see recent questions in JIRA or mailing lists?
>>> >> >>
>>> >> >> How often does it appear in StackOverflow or blogs?
>>> >> >>
>>> >> >> Behavior after the break - How will a program that works today,
>>> work after the break? The following are listed roughly in order of
>>> increasing severity:
>>> >> >>
>>> >> >> Will there be a compiler or linker error?
>>> >> >>
>>> >> >> Will there be a runtime exception?
>>> >> >>
>>> >> >> Will that exception happen after significant processing has been
>>> done?
>>> >> >>
>>> >> >> Will we silently return different answers? (very hard to debug,
>>> might not even notice!)
>>> >> >>
>>> >> >>
>>> >> >> Cost of Maintaining an API
>>> >> >>
>>> >> >> Of course, the above does not mean that we will never break any
>>> APIs. We must also consider the cost both to the project and to our users
>>> of keeping the API in question.
>>> >> >>
>>> >> >> Project Costs - Every API we have needs to be tested and needs to
>>> keep working as other parts of the project changes. These costs are
>>> significantly exacerbated when external dependencies change (the JVM,
>>> Scala, etc). In some cases, while not completely technically infeasible,
>>> the cost of maintaining a particular API can become too high.
>>> >> >>
>>> >> >> User Costs - APIs also have a cognitive cost to users learning
>>> Spark or trying to understand Spark programs. This cost becomes even higher
>>> when the API in question has confusing or undefined semantics.
>>> >> >>
>>> >> >>
>>> >> >> Alternatives to Breaking an API
>>> >> >>
>>> >> >> In cases where there is a "Bad API", but where the cost of removal
>>> is also high, there are alternatives that should be considered that do not
>>> hurt existing users but do address some of the maintenance costs.
>>> >> >>
>>> >> >>
>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important
>>> point. Anytime we are adding a new interface to Spark we should consider
>>> that we might be stuck with this API forever. Think deeply about how new
>>> APIs relate to existing ones, as well as how you expect them to evolve over
>>> time.
>>> >> >>
>>> >> >> Deprecation Warnings - All deprecation warnings should point to a
>>> clear alternative and should never just say that an API is deprecated.
>>> >> >>
>>> >> >> Updated Docs - Documentation should point to the "best"
>>> recommended way of performing a given task. In the cases where we maintain
>>> legacy documentation, we should clearly point to newer APIs and suggest to
>>> users the "right" way.
>>> >> >>
>>> >> >> Community Work - Many people learn Spark by reading blogs and
>>> other sites such as StackOverflow. However, many of these resources are out
>>> of date. Update them, to reduce the cost of eventually removing deprecated
>>> APIs.
>>> >> >>
>>> >> >>
>>> >> >> </new policy>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>

-- 
---
Takeshi Yamamuro

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to