Re: [VOTE] Amend Spark's Semantic Versioning Policy

Ismaël Mejía Sun, 08 Mar 2020 10:10:04 -0700

+1 (non-binding)

Michael's section on the trade-offs of maintaining / removing an API are one of
the best reads I have seeing in this mailing list. Enthusiast +1


On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:
>
> This new policy has a good indention, but can we narrow down on the migration 
> from Apache Spark 2.4.5 to Apache Spark 3.0+?
>
> I saw that there already exists a reverting PR to bring back Spark 1.4 and 
> 1.5 APIs based on this AS-IS suggestion.
>
> The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, and 
> it's nice.
>
> However, for the other cases, it sounds like `recommending older APIs as much 
> as possible` due to the following.
>
>      > How long has the API been in Spark?
>
> We had better be more careful when we add a new policy and should aim not to 
> mislead the users and 3rd party library developers to say "older is better".
>
> Technically, I'm wondering who will use new APIs in their examples (of books 
> and StackOverflow) if they need to write an additional warning like `this 
> only works at 2.4.0+` always .
>
> Bests,
> Dongjoon.
>
> On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com> wrote:
>>
>> I am in broad agreement with the prposal, as any developer, I prefer
>> stable well designed API's :-)
>>
>> Can we tie the proposal to stability guarantees given by spark and
>> reasonable expectation from users ?
>> In my opinion, an unstable or evolving could change - while an
>> experimental api which has been around for ages should be more
>> conservatively handled.
>> Which brings in question what are the stability guarantees as
>> specified by annotations interacting with the proposal.
>>
>> Also, can we expand on 'when' an API change can occur ?  Since we are
>> proposing to diverge from semver.
>> Patch release ? Minor release ? Only major release ? Based on 'impact'
>> of API ? Stability guarantees ?
>>
>> Regards,
>> Mridul
>>
>>
>>
>> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <mich...@databricks.com> 
>> wrote:
>> >
>> > I'll start off the vote with a strong +1 (binding).
>> >
>> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com> 
>> > wrote:
>> >>
>> >> I propose to add the following text to Spark's Semantic Versioning policy 
>> >> and adopt it as the rubric that should be used when deciding to break 
>> >> APIs (even at major versions such as 3.0).
>> >>
>> >>
>> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
>> >> procedural vote, the measure will pass if there are more favourable votes 
>> >> than unfavourable ones. PMC votes are binding, but the community is 
>> >> encouraged to add their voice to the discussion.
>> >>
>> >>
>> >> [ ] +1 - Spark should adopt this policy.
>> >>
>> >> [ ] -1  - Spark should not adopt this policy.
>> >>
>> >>
>> >> <new policy>
>> >>
>> >>
>> >> Considerations When Breaking APIs
>> >>
>> >> The Spark project strives to avoid breaking APIs or silently changing 
>> >> behavior, even at major versions. While this is not always possible, the 
>> >> balance of the following factors should be considered before choosing to 
>> >> break an API.
>> >>
>> >>
>> >> Cost of Breaking an API
>> >>
>> >> Breaking an API almost always has a non-trivial cost to the users of 
>> >> Spark. A broken API means that Spark programs need to be rewritten before 
>> >> they can be upgraded. However, there are a few considerations when 
>> >> thinking about what the cost will be:
>> >>
>> >> Usage - an API that is actively used in many different places, is always 
>> >> very costly to break. While it is hard to know usage for sure, there are 
>> >> a bunch of ways that we can estimate:
>> >>
>> >> How long has the API been in Spark?
>> >>
>> >> Is the API common even for basic programs?
>> >>
>> >> How often do we see recent questions in JIRA or mailing lists?
>> >>
>> >> How often does it appear in StackOverflow or blogs?
>> >>
>> >> Behavior after the break - How will a program that works today, work 
>> >> after the break? The following are listed roughly in order of increasing 
>> >> severity:
>> >>
>> >> Will there be a compiler or linker error?
>> >>
>> >> Will there be a runtime exception?
>> >>
>> >> Will that exception happen after significant processing has been done?
>> >>
>> >> Will we silently return different answers? (very hard to debug, might not 
>> >> even notice!)
>> >>
>> >>
>> >> Cost of Maintaining an API
>> >>
>> >> Of course, the above does not mean that we will never break any APIs. We 
>> >> must also consider the cost both to the project and to our users of 
>> >> keeping the API in question.
>> >>
>> >> Project Costs - Every API we have needs to be tested and needs to keep 
>> >> working as other parts of the project changes. These costs are 
>> >> significantly exacerbated when external dependencies change (the JVM, 
>> >> Scala, etc). In some cases, while not completely technically infeasible, 
>> >> the cost of maintaining a particular API can become too high.
>> >>
>> >> User Costs - APIs also have a cognitive cost to users learning Spark or 
>> >> trying to understand Spark programs. This cost becomes even higher when 
>> >> the API in question has confusing or undefined semantics.
>> >>
>> >>
>> >> Alternatives to Breaking an API
>> >>
>> >> In cases where there is a "Bad API", but where the cost of removal is 
>> >> also high, there are alternatives that should be considered that do not 
>> >> hurt existing users but do address some of the maintenance costs.
>> >>
>> >>
>> >> Avoid Bad APIs - While this is a bit obvious, it is an important point. 
>> >> Anytime we are adding a new interface to Spark we should consider that we 
>> >> might be stuck with this API forever. Think deeply about how new APIs 
>> >> relate to existing ones, as well as how you expect them to evolve over 
>> >> time.
>> >>
>> >> Deprecation Warnings - All deprecation warnings should point to a clear 
>> >> alternative and should never just say that an API is deprecated.
>> >>
>> >> Updated Docs - Documentation should point to the "best" recommended way 
>> >> of performing a given task. In the cases where we maintain legacy 
>> >> documentation, we should clearly point to newer APIs and suggest to users 
>> >> the "right" way.
>> >>
>> >> Community Work - Many people learn Spark by reading blogs and other sites 
>> >> such as StackOverflow. However, many of these resources are out of date. 
>> >> Update them, to reduce the cost of eventually removing deprecated APIs.
>> >>
>> >>
>> >> </new policy>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to