+1 (non-binding) On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> wrote:
> The proposal itself seems good as the factors to consider, Thanks Michael. > > Several concerns mentioned look good points, in particular: > > > ... assuming that this is for public stable APIs, not APIs that are > marked as unstable, evolving, etc. ... > I would like to confirm this. We already have API annotations such as > Experimental, Unstable, etc. and the implication of each is still > effective. If it's for stable APIs, it makes sense to me as well. > > > ... can we expand on 'when' an API change can occur ? Since we are > proposing to diverge from semver. ... > I think this is a good point. If we're proposing to divert from semver, > the delta compared to semver will have to be clarified to avoid different > personal interpretations of the somewhat general principles. > > > ... can we narrow down on the migration from Apache Spark 2.4.5 to > Apache Spark 3.0+? ... > > Assuming these concerns will be addressed, +1 (binding). > > > 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 작성: > >> +1 (non-binding) >> >> Bests, >> Takeshi >> >> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >> gengliang.w...@databricks.com> wrote: >> >>> +1 (non-binding) >>> >>> Gengliang >>> >>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com> >>> wrote: >>> >>>> +1 as well. >>>> >>>> Matei >>>> >>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>>> >>>> +1 (binding), assuming that this is for public stable APIs, not APIs >>>> that are marked as unstable, evolving, etc. >>>> >>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Michael's section on the trade-offs of maintaining / removing an API >>>>> are one of >>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>> >>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> >>>>> wrote: >>>>> > >>>>> > This new policy has a good indention, but can we narrow down on the >>>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>> > >>>>> > I saw that there already exists a reverting PR to bring back Spark >>>>> 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>> > >>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>> difficulty, and it's nice. >>>>> > >>>>> > However, for the other cases, it sounds like `recommending older >>>>> APIs as much as possible` due to the following. >>>>> > >>>>> > > How long has the API been in Spark? >>>>> > >>>>> > We had better be more careful when we add a new policy and should >>>>> aim not to mislead the users and 3rd party library developers to say >>>>> "older >>>>> is better". >>>>> > >>>>> > Technically, I'm wondering who will use new APIs in their examples >>>>> (of books and StackOverflow) if they need to write an additional warning >>>>> like `this only works at 2.4.0+` always . >>>>> > >>>>> > Bests, >>>>> > Dongjoon. >>>>> > >>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> I am in broad agreement with the prposal, as any developer, I prefer >>>>> >> stable well designed API's :-) >>>>> >> >>>>> >> Can we tie the proposal to stability guarantees given by spark and >>>>> >> reasonable expectation from users ? >>>>> >> In my opinion, an unstable or evolving could change - while an >>>>> >> experimental api which has been around for ages should be more >>>>> >> conservatively handled. >>>>> >> Which brings in question what are the stability guarantees as >>>>> >> specified by annotations interacting with the proposal. >>>>> >> >>>>> >> Also, can we expand on 'when' an API change can occur ? Since we >>>>> are >>>>> >> proposing to diverge from semver. >>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>> 'impact' >>>>> >> of API ? Stability guarantees ? >>>>> >> >>>>> >> Regards, >>>>> >> Mridul >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>> mich...@databricks.com> wrote: >>>>> >> > >>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>> >> > >>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>> mich...@databricks.com> wrote: >>>>> >> >> >>>>> >> >> I propose to add the following text to Spark's Semantic >>>>> Versioning policy and adopt it as the rubric that should be used when >>>>> deciding to break APIs (even at major versions such as 3.0). >>>>> >> >> >>>>> >> >> >>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As >>>>> this is a procedural vote, the measure will pass if there are more >>>>> favourable votes than unfavourable ones. PMC votes are binding, but the >>>>> community is encouraged to add their voice to the discussion. >>>>> >> >> >>>>> >> >> >>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>> >> >> >>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>> >> >> >>>>> >> >> >>>>> >> >> <new policy> >>>>> >> >> >>>>> >> >> >>>>> >> >> Considerations When Breaking APIs >>>>> >> >> >>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>> changing behavior, even at major versions. While this is not always >>>>> possible, the balance of the following factors should be considered before >>>>> choosing to break an API. >>>>> >> >> >>>>> >> >> >>>>> >> >> Cost of Breaking an API >>>>> >> >> >>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>> users of Spark. A broken API means that Spark programs need to be >>>>> rewritten >>>>> before they can be upgraded. However, there are a few considerations when >>>>> thinking about what the cost will be: >>>>> >> >> >>>>> >> >> Usage - an API that is actively used in many different places, >>>>> is always very costly to break. While it is hard to know usage for sure, >>>>> there are a bunch of ways that we can estimate: >>>>> >> >> >>>>> >> >> How long has the API been in Spark? >>>>> >> >> >>>>> >> >> Is the API common even for basic programs? >>>>> >> >> >>>>> >> >> How often do we see recent questions in JIRA or mailing lists? >>>>> >> >> >>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>> >> >> >>>>> >> >> Behavior after the break - How will a program that works today, >>>>> work after the break? The following are listed roughly in order of >>>>> increasing severity: >>>>> >> >> >>>>> >> >> Will there be a compiler or linker error? >>>>> >> >> >>>>> >> >> Will there be a runtime exception? >>>>> >> >> >>>>> >> >> Will that exception happen after significant processing has been >>>>> done? >>>>> >> >> >>>>> >> >> Will we silently return different answers? (very hard to debug, >>>>> might not even notice!) >>>>> >> >> >>>>> >> >> >>>>> >> >> Cost of Maintaining an API >>>>> >> >> >>>>> >> >> Of course, the above does not mean that we will never break any >>>>> APIs. We must also consider the cost both to the project and to our users >>>>> of keeping the API in question. >>>>> >> >> >>>>> >> >> Project Costs - Every API we have needs to be tested and needs >>>>> to keep working as other parts of the project changes. These costs are >>>>> significantly exacerbated when external dependencies change (the JVM, >>>>> Scala, etc). In some cases, while not completely technically infeasible, >>>>> the cost of maintaining a particular API can become too high. >>>>> >> >> >>>>> >> >> User Costs - APIs also have a cognitive cost to users learning >>>>> Spark or trying to understand Spark programs. This cost becomes even >>>>> higher >>>>> when the API in question has confusing or undefined semantics. >>>>> >> >> >>>>> >> >> >>>>> >> >> Alternatives to Breaking an API >>>>> >> >> >>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>> removal is also high, there are alternatives that should be considered >>>>> that >>>>> do not hurt existing users but do address some of the maintenance costs. >>>>> >> >> >>>>> >> >> >>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important >>>>> point. Anytime we are adding a new interface to Spark we should consider >>>>> that we might be stuck with this API forever. Think deeply about how new >>>>> APIs relate to existing ones, as well as how you expect them to evolve >>>>> over >>>>> time. >>>>> >> >> >>>>> >> >> Deprecation Warnings - All deprecation warnings should point to >>>>> a clear alternative and should never just say that an API is deprecated. >>>>> >> >> >>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>> recommended way of performing a given task. In the cases where we maintain >>>>> legacy documentation, we should clearly point to newer APIs and suggest to >>>>> users the "right" way. >>>>> >> >> >>>>> >> >> Community Work - Many people learn Spark by reading blogs and >>>>> other sites such as StackOverflow. However, many of these resources are >>>>> out >>>>> of date. Update them, to reduce the cost of eventually removing deprecated >>>>> APIs. >>>>> >> >> >>>>> >> >> >>>>> >> >> </new policy> >>>>> >> >>>>> >> >>>>> --------------------------------------------------------------------- >>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>> >>>>> >>>> >> >> -- >> --- >> Takeshi Yamamuro >> >