+1 as well.

Matei

> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
> 
> +1 (binding), assuming that this is for public stable APIs, not APIs that are 
> marked as unstable, evolving, etc.
> 
> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com 
> <mailto:ieme...@gmail.com>> wrote:
> +1 (non-binding)
> 
> Michael's section on the trade-offs of maintaining / removing an API are one 
> of
> the best reads I have seeing in this mailing list. Enthusiast +1
> 
> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com 
> <mailto:dongjoon.h...@gmail.com>> wrote:
> >
> > This new policy has a good indention, but can we narrow down on the 
> > migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
> >
> > I saw that there already exists a reverting PR to bring back Spark 1.4 and 
> > 1.5 APIs based on this AS-IS suggestion.
> >
> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, and 
> > it's nice.
> >
> > However, for the other cases, it sounds like `recommending older APIs as 
> > much as possible` due to the following.
> >
> >      > How long has the API been in Spark?
> >
> > We had better be more careful when we add a new policy and should aim not 
> > to mislead the users and 3rd party library developers to say "older is 
> > better".
> >
> > Technically, I'm wondering who will use new APIs in their examples (of 
> > books and StackOverflow) if they need to write an additional warning like 
> > `this only works at 2.4.0+` always .
> >
> > Bests,
> > Dongjoon.
> >
> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com 
> > <mailto:mri...@gmail.com>> wrote:
> >>
> >> I am in broad agreement with the prposal, as any developer, I prefer
> >> stable well designed API's :-)
> >>
> >> Can we tie the proposal to stability guarantees given by spark and
> >> reasonable expectation from users ?
> >> In my opinion, an unstable or evolving could change - while an
> >> experimental api which has been around for ages should be more
> >> conservatively handled.
> >> Which brings in question what are the stability guarantees as
> >> specified by annotations interacting with the proposal.
> >>
> >> Also, can we expand on 'when' an API change can occur ?  Since we are
> >> proposing to diverge from semver.
> >> Patch release ? Minor release ? Only major release ? Based on 'impact'
> >> of API ? Stability guarantees ?
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >>
> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <mich...@databricks.com 
> >> <mailto:mich...@databricks.com>> wrote:
> >> >
> >> > I'll start off the vote with a strong +1 (binding).
> >> >
> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com 
> >> > <mailto:mich...@databricks.com>> wrote:
> >> >>
> >> >> I propose to add the following text to Spark's Semantic Versioning 
> >> >> policy and adopt it as the rubric that should be used when deciding to 
> >> >> break APIs (even at major versions such as 3.0).
> >> >>
> >> >>
> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
> >> >> procedural vote, the measure will pass if there are more favourable 
> >> >> votes than unfavourable ones. PMC votes are binding, but the community 
> >> >> is encouraged to add their voice to the discussion.
> >> >>
> >> >>
> >> >> [ ] +1 - Spark should adopt this policy.
> >> >>
> >> >> [ ] -1  - Spark should not adopt this policy.
> >> >>
> >> >>
> >> >> <new policy>
> >> >>
> >> >>
> >> >> Considerations When Breaking APIs
> >> >>
> >> >> The Spark project strives to avoid breaking APIs or silently changing 
> >> >> behavior, even at major versions. While this is not always possible, 
> >> >> the balance of the following factors should be considered before 
> >> >> choosing to break an API.
> >> >>
> >> >>
> >> >> Cost of Breaking an API
> >> >>
> >> >> Breaking an API almost always has a non-trivial cost to the users of 
> >> >> Spark. A broken API means that Spark programs need to be rewritten 
> >> >> before they can be upgraded. However, there are a few considerations 
> >> >> when thinking about what the cost will be:
> >> >>
> >> >> Usage - an API that is actively used in many different places, is 
> >> >> always very costly to break. While it is hard to know usage for sure, 
> >> >> there are a bunch of ways that we can estimate:
> >> >>
> >> >> How long has the API been in Spark?
> >> >>
> >> >> Is the API common even for basic programs?
> >> >>
> >> >> How often do we see recent questions in JIRA or mailing lists?
> >> >>
> >> >> How often does it appear in StackOverflow or blogs?
> >> >>
> >> >> Behavior after the break - How will a program that works today, work 
> >> >> after the break? The following are listed roughly in order of 
> >> >> increasing severity:
> >> >>
> >> >> Will there be a compiler or linker error?
> >> >>
> >> >> Will there be a runtime exception?
> >> >>
> >> >> Will that exception happen after significant processing has been done?
> >> >>
> >> >> Will we silently return different answers? (very hard to debug, might 
> >> >> not even notice!)
> >> >>
> >> >>
> >> >> Cost of Maintaining an API
> >> >>
> >> >> Of course, the above does not mean that we will never break any APIs. 
> >> >> We must also consider the cost both to the project and to our users of 
> >> >> keeping the API in question.
> >> >>
> >> >> Project Costs - Every API we have needs to be tested and needs to keep 
> >> >> working as other parts of the project changes. These costs are 
> >> >> significantly exacerbated when external dependencies change (the JVM, 
> >> >> Scala, etc). In some cases, while not completely technically 
> >> >> infeasible, the cost of maintaining a particular API can become too 
> >> >> high.
> >> >>
> >> >> User Costs - APIs also have a cognitive cost to users learning Spark or 
> >> >> trying to understand Spark programs. This cost becomes even higher when 
> >> >> the API in question has confusing or undefined semantics.
> >> >>
> >> >>
> >> >> Alternatives to Breaking an API
> >> >>
> >> >> In cases where there is a "Bad API", but where the cost of removal is 
> >> >> also high, there are alternatives that should be considered that do not 
> >> >> hurt existing users but do address some of the maintenance costs.
> >> >>
> >> >>
> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important point. 
> >> >> Anytime we are adding a new interface to Spark we should consider that 
> >> >> we might be stuck with this API forever. Think deeply about how new 
> >> >> APIs relate to existing ones, as well as how you expect them to evolve 
> >> >> over time.
> >> >>
> >> >> Deprecation Warnings - All deprecation warnings should point to a clear 
> >> >> alternative and should never just say that an API is deprecated.
> >> >>
> >> >> Updated Docs - Documentation should point to the "best" recommended way 
> >> >> of performing a given task. In the cases where we maintain legacy 
> >> >> documentation, we should clearly point to newer APIs and suggest to 
> >> >> users the "right" way.
> >> >>
> >> >> Community Work - Many people learn Spark by reading blogs and other 
> >> >> sites such as StackOverflow. However, many of these resources are out 
> >> >> of date. Update them, to reduce the cost of eventually removing 
> >> >> deprecated APIs.
> >> >>
> >> >>
> >> >> </new policy>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >> <mailto:dev-unsubscr...@spark.apache.org>
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> <mailto:dev-unsubscr...@spark.apache.org>
> 

Reply via email to