Re: [VOTE] Amend Spark's Semantic Versioning Policy

Michael Heuer Mon, 09 Mar 2020 13:32:52 -0700

+1 (non-binding)

I am disappointed however that this only mentions API and not dependencies and 
transitive dependencies.


As Spark does not provide separation between its runtime classpath and the 
classpath used by applications, I believe Spark's dependencies and transitive 
dependencies should be considered part of the API for this policy.  Breaking 
dependency upgrades and incompatible dependency versions are the source of much 
frustration.

   michael


> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ues...@happy-camper.st> wrote:
> 
> +1 (binding)
> 
> 
> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com 
> <mailto:jiangxb1...@gmail.com>> wrote:
> +1 (non-binding)
> 
> Cheers,
> 
> Xingbo
> 
> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com 
> <mailto:lix...@databricks.com>> wrote:
> +1 (binding)
> 
> Xiao
> 
> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com 
> <mailto:denny.g....@gmail.com>> wrote:
> +1 (non-binding)
> 
> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com 
> <mailto:gurwls...@gmail.com>> wrote:
> The proposal itself seems good as the factors to consider, Thanks Michael.
> 
> Several concerns mentioned look good points, in particular:
> 
> > ... assuming that this is for public stable APIs, not APIs that are marked 
> > as unstable, evolving, etc. ...
> I would like to confirm this. We already have API annotations such as 
> Experimental, Unstable, etc. and the implication of each is still effective. 
> If it's for stable APIs, it makes sense to me as well.
> 
> > ... can we expand on 'when' an API change can occur ?  Since we are 
> > proposing to diverge from semver. ...
> I think this is a good point. If we're proposing to divert from semver, the 
> delta compared to semver will have to be clarified to avoid different 
> personal interpretations of the somewhat general principles.
> 
> > ... can we narrow down on the migration from Apache Spark 2.4.5 to Apache 
> > Spark 3.0+? ...
> 
> Assuming these concerns will be addressed, +1 (binding).
> 
>  
> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com 
> <mailto:linguin....@gmail.com>>님이 작성:
> +1 (non-binding)
> 
> Bests,
> Takeshi
> 
> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <gengliang.w...@databricks.com 
> <mailto:gengliang.w...@databricks.com>> wrote:
> +1 (non-binding)
> 
> Gengliang
> 
> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com 
> <mailto:matei.zaha...@gmail.com>> wrote:
> +1 as well.
> 
> Matei
> 
>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com 
>> <mailto:cloud0...@gmail.com>> wrote:
>> 
>> +1 (binding), assuming that this is for public stable APIs, not APIs that 
>> are marked as unstable, evolving, etc.
>> 
>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com 
>> <mailto:ieme...@gmail.com>> wrote:
>> +1 (non-binding)
>> 
>> Michael's section on the trade-offs of maintaining / removing an API are one 
>> of
>> the best reads I have seeing in this mailing list. Enthusiast +1
>> 
>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com 
>> <mailto:dongjoon.h...@gmail.com>> wrote:
>> >
>> > This new policy has a good indention, but can we narrow down on the 
>> > migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>> >
>> > I saw that there already exists a reverting PR to bring back Spark 1.4 and 
>> > 1.5 APIs based on this AS-IS suggestion.
>> >
>> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, 
>> > and it's nice.
>> >
>> > However, for the other cases, it sounds like `recommending older APIs as 
>> > much as possible` due to the following.
>> >
>> >      > How long has the API been in Spark?
>> >
>> > We had better be more careful when we add a new policy and should aim not 
>> > to mislead the users and 3rd party library developers to say "older is 
>> > better".
>> >
>> > Technically, I'm wondering who will use new APIs in their examples (of 
>> > books and StackOverflow) if they need to write an additional warning like 
>> > `this only works at 2.4.0+` always .
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com 
>> > <mailto:mri...@gmail.com>> wrote:
>> >>
>> >> I am in broad agreement with the prposal, as any developer, I prefer
>> >> stable well designed API's :-)
>> >>
>> >> Can we tie the proposal to stability guarantees given by spark and
>> >> reasonable expectation from users ?
>> >> In my opinion, an unstable or evolving could change - while an
>> >> experimental api which has been around for ages should be more
>> >> conservatively handled.
>> >> Which brings in question what are the stability guarantees as
>> >> specified by annotations interacting with the proposal.
>> >>
>> >> Also, can we expand on 'when' an API change can occur ?  Since we are
>> >> proposing to diverge from semver.
>> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
>> >> of API ? Stability guarantees ?
>> >>
>> >> Regards,
>> >> Mridul
>> >>
>> >>
>> >>
>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <mich...@databricks.com 
>> >> <mailto:mich...@databricks.com>> wrote:
>> >> >
>> >> > I'll start off the vote with a strong +1 (binding).
>> >> >
>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com 
>> >> > <mailto:mich...@databricks.com>> wrote:
>> >> >>
>> >> >> I propose to add the following text to Spark's Semantic Versioning 
>> >> >> policy and adopt it as the rubric that should be used when deciding to 
>> >> >> break APIs (even at major versions such as 3.0).
>> >> >>
>> >> >>
>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is 
>> >> >> a procedural vote, the measure will pass if there are more favourable 
>> >> >> votes than unfavourable ones. PMC votes are binding, but the community 
>> >> >> is encouraged to add their voice to the discussion.
>> >> >>
>> >> >>
>> >> >> [ ] +1 - Spark should adopt this policy.
>> >> >>
>> >> >> [ ] -1  - Spark should not adopt this policy.
>> >> >>
>> >> >>
>> >> >> <new policy>
>> >> >>
>> >> >>
>> >> >> Considerations When Breaking APIs
>> >> >>
>> >> >> The Spark project strives to avoid breaking APIs or silently changing 
>> >> >> behavior, even at major versions. While this is not always possible, 
>> >> >> the balance of the following factors should be considered before 
>> >> >> choosing to break an API.
>> >> >>
>> >> >>
>> >> >> Cost of Breaking an API
>> >> >>
>> >> >> Breaking an API almost always has a non-trivial cost to the users of 
>> >> >> Spark. A broken API means that Spark programs need to be rewritten 
>> >> >> before they can be upgraded. However, there are a few considerations 
>> >> >> when thinking about what the cost will be:
>> >> >>
>> >> >> Usage - an API that is actively used in many different places, is 
>> >> >> always very costly to break. While it is hard to know usage for sure, 
>> >> >> there are a bunch of ways that we can estimate:
>> >> >>
>> >> >> How long has the API been in Spark?
>> >> >>
>> >> >> Is the API common even for basic programs?
>> >> >>
>> >> >> How often do we see recent questions in JIRA or mailing lists?
>> >> >>
>> >> >> How often does it appear in StackOverflow or blogs?
>> >> >>
>> >> >> Behavior after the break - How will a program that works today, work 
>> >> >> after the break? The following are listed roughly in order of 
>> >> >> increasing severity:
>> >> >>
>> >> >> Will there be a compiler or linker error?
>> >> >>
>> >> >> Will there be a runtime exception?
>> >> >>
>> >> >> Will that exception happen after significant processing has been done?
>> >> >>
>> >> >> Will we silently return different answers? (very hard to debug, might 
>> >> >> not even notice!)
>> >> >>
>> >> >>
>> >> >> Cost of Maintaining an API
>> >> >>
>> >> >> Of course, the above does not mean that we will never break any APIs. 
>> >> >> We must also consider the cost both to the project and to our users of 
>> >> >> keeping the API in question.
>> >> >>
>> >> >> Project Costs - Every API we have needs to be tested and needs to keep 
>> >> >> working as other parts of the project changes. These costs are 
>> >> >> significantly exacerbated when external dependencies change (the JVM, 
>> >> >> Scala, etc). In some cases, while not completely technically 
>> >> >> infeasible, the cost of maintaining a particular API can become too 
>> >> >> high.
>> >> >>
>> >> >> User Costs - APIs also have a cognitive cost to users learning Spark 
>> >> >> or trying to understand Spark programs. This cost becomes even higher 
>> >> >> when the API in question has confusing or undefined semantics.
>> >> >>
>> >> >>
>> >> >> Alternatives to Breaking an API
>> >> >>
>> >> >> In cases where there is a "Bad API", but where the cost of removal is 
>> >> >> also high, there are alternatives that should be considered that do 
>> >> >> not hurt existing users but do address some of the maintenance costs.
>> >> >>
>> >> >>
>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important 
>> >> >> point. Anytime we are adding a new interface to Spark we should 
>> >> >> consider that we might be stuck with this API forever. Think deeply 
>> >> >> about how new APIs relate to existing ones, as well as how you expect 
>> >> >> them to evolve over time.
>> >> >>
>> >> >> Deprecation Warnings - All deprecation warnings should point to a 
>> >> >> clear alternative and should never just say that an API is deprecated.
>> >> >>
>> >> >> Updated Docs - Documentation should point to the "best" recommended 
>> >> >> way of performing a given task. In the cases where we maintain legacy 
>> >> >> documentation, we should clearly point to newer APIs and suggest to 
>> >> >> users the "right" way.
>> >> >>
>> >> >> Community Work - Many people learn Spark by reading blogs and other 
>> >> >> sites such as StackOverflow. However, many of these resources are out 
>> >> >> of date. Update them, to reduce the cost of eventually removing 
>> >> >> deprecated APIs.
>> >> >>
>> >> >>
>> >> >> </new policy>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> >> <mailto:dev-unsubscr...@spark.apache.org>
>> >>
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
>> <mailto:dev-unsubscr...@spark.apache.org>
>> 
> 
> 
> 
> -- 
> ---
> Takeshi Yamamuro
> 
> 
> -- 
>  <https://databricks.com/sparkaisummit/north-america>
> 
> -- 
> Takuya UESHIN
> 
> http://twitter.com/ueshin <http://twitter.com/ueshin>

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to