+1 (non-binding) Bests, Takeshi
On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <gengliang.w...@databricks.com> wrote: > +1 (non-binding) > > Gengliang > > On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com> > wrote: > >> +1 as well. >> >> Matei >> >> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >> >> +1 (binding), assuming that this is for public stable APIs, not APIs that >> are marked as unstable, evolving, etc. >> >> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> wrote: >> >>> +1 (non-binding) >>> >>> Michael's section on the trade-offs of maintaining / removing an API are >>> one of >>> the best reads I have seeing in this mailing list. Enthusiast +1 >>> >>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>> wrote: >>> > >>> > This new policy has a good indention, but can we narrow down on the >>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>> > >>> > I saw that there already exists a reverting PR to bring back Spark 1.4 >>> and 1.5 APIs based on this AS-IS suggestion. >>> > >>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>> difficulty, and it's nice. >>> > >>> > However, for the other cases, it sounds like `recommending older APIs >>> as much as possible` due to the following. >>> > >>> > > How long has the API been in Spark? >>> > >>> > We had better be more careful when we add a new policy and should aim >>> not to mislead the users and 3rd party library developers to say "older is >>> better". >>> > >>> > Technically, I'm wondering who will use new APIs in their examples (of >>> books and StackOverflow) if they need to write an additional warning like >>> `this only works at 2.4.0+` always . >>> > >>> > Bests, >>> > Dongjoon. >>> > >>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com> >>> wrote: >>> >> >>> >> I am in broad agreement with the prposal, as any developer, I prefer >>> >> stable well designed API's :-) >>> >> >>> >> Can we tie the proposal to stability guarantees given by spark and >>> >> reasonable expectation from users ? >>> >> In my opinion, an unstable or evolving could change - while an >>> >> experimental api which has been around for ages should be more >>> >> conservatively handled. >>> >> Which brings in question what are the stability guarantees as >>> >> specified by annotations interacting with the proposal. >>> >> >>> >> Also, can we expand on 'when' an API change can occur ? Since we are >>> >> proposing to diverge from semver. >>> >> Patch release ? Minor release ? Only major release ? Based on 'impact' >>> >> of API ? Stability guarantees ? >>> >> >>> >> Regards, >>> >> Mridul >>> >> >>> >> >>> >> >>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >> > >>> >> > I'll start off the vote with a strong +1 (binding). >>> >> > >>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>> mich...@databricks.com> wrote: >>> >> >> >>> >> >> I propose to add the following text to Spark's Semantic Versioning >>> policy and adopt it as the rubric that should be used when deciding to >>> break APIs (even at major versions such as 3.0). >>> >> >> >>> >> >> >>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this >>> is a procedural vote, the measure will pass if there are more favourable >>> votes than unfavourable ones. PMC votes are binding, but the community is >>> encouraged to add their voice to the discussion. >>> >> >> >>> >> >> >>> >> >> [ ] +1 - Spark should adopt this policy. >>> >> >> >>> >> >> [ ] -1 - Spark should not adopt this policy. >>> >> >> >>> >> >> >>> >> >> <new policy> >>> >> >> >>> >> >> >>> >> >> Considerations When Breaking APIs >>> >> >> >>> >> >> The Spark project strives to avoid breaking APIs or silently >>> changing behavior, even at major versions. While this is not always >>> possible, the balance of the following factors should be considered before >>> choosing to break an API. >>> >> >> >>> >> >> >>> >> >> Cost of Breaking an API >>> >> >> >>> >> >> Breaking an API almost always has a non-trivial cost to the users >>> of Spark. A broken API means that Spark programs need to be rewritten >>> before they can be upgraded. However, there are a few considerations when >>> thinking about what the cost will be: >>> >> >> >>> >> >> Usage - an API that is actively used in many different places, is >>> always very costly to break. While it is hard to know usage for sure, there >>> are a bunch of ways that we can estimate: >>> >> >> >>> >> >> How long has the API been in Spark? >>> >> >> >>> >> >> Is the API common even for basic programs? >>> >> >> >>> >> >> How often do we see recent questions in JIRA or mailing lists? >>> >> >> >>> >> >> How often does it appear in StackOverflow or blogs? >>> >> >> >>> >> >> Behavior after the break - How will a program that works today, >>> work after the break? The following are listed roughly in order of >>> increasing severity: >>> >> >> >>> >> >> Will there be a compiler or linker error? >>> >> >> >>> >> >> Will there be a runtime exception? >>> >> >> >>> >> >> Will that exception happen after significant processing has been >>> done? >>> >> >> >>> >> >> Will we silently return different answers? (very hard to debug, >>> might not even notice!) >>> >> >> >>> >> >> >>> >> >> Cost of Maintaining an API >>> >> >> >>> >> >> Of course, the above does not mean that we will never break any >>> APIs. We must also consider the cost both to the project and to our users >>> of keeping the API in question. >>> >> >> >>> >> >> Project Costs - Every API we have needs to be tested and needs to >>> keep working as other parts of the project changes. These costs are >>> significantly exacerbated when external dependencies change (the JVM, >>> Scala, etc). In some cases, while not completely technically infeasible, >>> the cost of maintaining a particular API can become too high. >>> >> >> >>> >> >> User Costs - APIs also have a cognitive cost to users learning >>> Spark or trying to understand Spark programs. This cost becomes even higher >>> when the API in question has confusing or undefined semantics. >>> >> >> >>> >> >> >>> >> >> Alternatives to Breaking an API >>> >> >> >>> >> >> In cases where there is a "Bad API", but where the cost of removal >>> is also high, there are alternatives that should be considered that do not >>> hurt existing users but do address some of the maintenance costs. >>> >> >> >>> >> >> >>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important >>> point. Anytime we are adding a new interface to Spark we should consider >>> that we might be stuck with this API forever. Think deeply about how new >>> APIs relate to existing ones, as well as how you expect them to evolve over >>> time. >>> >> >> >>> >> >> Deprecation Warnings - All deprecation warnings should point to a >>> clear alternative and should never just say that an API is deprecated. >>> >> >> >>> >> >> Updated Docs - Documentation should point to the "best" >>> recommended way of performing a given task. In the cases where we maintain >>> legacy documentation, we should clearly point to newer APIs and suggest to >>> users the "right" way. >>> >> >> >>> >> >> Community Work - Many people learn Spark by reading blogs and >>> other sites such as StackOverflow. However, many of these resources are out >>> of date. Update them, to reduce the cost of eventually removing deprecated >>> APIs. >>> >> >> >>> >> >> >>> >> >> </new policy> >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> -- --- Takeshi Yamamuro