+1 (binding) Xiao
On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <[email protected]> wrote: > +1 (non-binding) > > On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <[email protected]> wrote: > >> The proposal itself seems good as the factors to consider, Thanks Michael. >> >> Several concerns mentioned look good points, in particular: >> >> > ... assuming that this is for public stable APIs, not APIs that are >> marked as unstable, evolving, etc. ... >> I would like to confirm this. We already have API annotations such as >> Experimental, Unstable, etc. and the implication of each is still >> effective. If it's for stable APIs, it makes sense to me as well. >> >> > ... can we expand on 'when' an API change can occur ? Since we are >> proposing to diverge from semver. ... >> I think this is a good point. If we're proposing to divert from semver, >> the delta compared to semver will have to be clarified to avoid different >> personal interpretations of the somewhat general principles. >> >> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >> Apache Spark 3.0+? ... >> >> Assuming these concerns will be addressed, +1 (binding). >> >> >> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <[email protected]>님이 작성: >> >>> +1 (non-binding) >>> >>> Bests, >>> Takeshi >>> >>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>> [email protected]> wrote: >>> >>>> +1 (non-binding) >>>> >>>> Gengliang >>>> >>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <[email protected]> >>>> wrote: >>>> >>>>> +1 as well. >>>>> >>>>> Matei >>>>> >>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <[email protected]> wrote: >>>>> >>>>> +1 (binding), assuming that this is for public stable APIs, not APIs >>>>> that are marked as unstable, evolving, etc. >>>>> >>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <[email protected]> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> Michael's section on the trade-offs of maintaining / removing an API >>>>>> are one of >>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>> >>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <[email protected]> >>>>>> wrote: >>>>>> > >>>>>> > This new policy has a good indention, but can we narrow down on the >>>>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>> > >>>>>> > I saw that there already exists a reverting PR to bring back Spark >>>>>> 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>> > >>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>> difficulty, and it's nice. >>>>>> > >>>>>> > However, for the other cases, it sounds like `recommending older >>>>>> APIs as much as possible` due to the following. >>>>>> > >>>>>> > > How long has the API been in Spark? >>>>>> > >>>>>> > We had better be more careful when we add a new policy and should >>>>>> aim not to mislead the users and 3rd party library developers to say >>>>>> "older >>>>>> is better". >>>>>> > >>>>>> > Technically, I'm wondering who will use new APIs in their examples >>>>>> (of books and StackOverflow) if they need to write an additional warning >>>>>> like `this only works at 2.4.0+` always . >>>>>> > >>>>>> > Bests, >>>>>> > Dongjoon. >>>>>> > >>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>> [email protected]> wrote: >>>>>> >> >>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>> prefer >>>>>> >> stable well designed API's :-) >>>>>> >> >>>>>> >> Can we tie the proposal to stability guarantees given by spark and >>>>>> >> reasonable expectation from users ? >>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>> >> experimental api which has been around for ages should be more >>>>>> >> conservatively handled. >>>>>> >> Which brings in question what are the stability guarantees as >>>>>> >> specified by annotations interacting with the proposal. >>>>>> >> >>>>>> >> Also, can we expand on 'when' an API change can occur ? Since we >>>>>> are >>>>>> >> proposing to diverge from semver. >>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>> 'impact' >>>>>> >> of API ? Stability guarantees ? >>>>>> >> >>>>>> >> Regards, >>>>>> >> Mridul >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>> [email protected]> wrote: >>>>>> >> > >>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>> >> > >>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>> [email protected]> wrote: >>>>>> >> >> >>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>> Versioning policy and adopt it as the rubric that should be used when >>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As >>>>>> this is a procedural vote, the measure will pass if there are more >>>>>> favourable votes than unfavourable ones. PMC votes are binding, but the >>>>>> community is encouraged to add their voice to the discussion. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>> >> >> >>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> <new policy> >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Considerations When Breaking APIs >>>>>> >> >> >>>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>>> changing behavior, even at major versions. While this is not always >>>>>> possible, the balance of the following factors should be considered >>>>>> before >>>>>> choosing to break an API. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Cost of Breaking an API >>>>>> >> >> >>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>> users of Spark. A broken API means that Spark programs need to be >>>>>> rewritten >>>>>> before they can be upgraded. However, there are a few considerations when >>>>>> thinking about what the cost will be: >>>>>> >> >> >>>>>> >> >> Usage - an API that is actively used in many different places, >>>>>> is always very costly to break. While it is hard to know usage for sure, >>>>>> there are a bunch of ways that we can estimate: >>>>>> >> >> >>>>>> >> >> How long has the API been in Spark? >>>>>> >> >> >>>>>> >> >> Is the API common even for basic programs? >>>>>> >> >> >>>>>> >> >> How often do we see recent questions in JIRA or mailing lists? >>>>>> >> >> >>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>> >> >> >>>>>> >> >> Behavior after the break - How will a program that works today, >>>>>> work after the break? The following are listed roughly in order of >>>>>> increasing severity: >>>>>> >> >> >>>>>> >> >> Will there be a compiler or linker error? >>>>>> >> >> >>>>>> >> >> Will there be a runtime exception? >>>>>> >> >> >>>>>> >> >> Will that exception happen after significant processing has >>>>>> been done? >>>>>> >> >> >>>>>> >> >> Will we silently return different answers? (very hard to debug, >>>>>> might not even notice!) >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Cost of Maintaining an API >>>>>> >> >> >>>>>> >> >> Of course, the above does not mean that we will never break any >>>>>> APIs. We must also consider the cost both to the project and to our users >>>>>> of keeping the API in question. >>>>>> >> >> >>>>>> >> >> Project Costs - Every API we have needs to be tested and needs >>>>>> to keep working as other parts of the project changes. These costs are >>>>>> significantly exacerbated when external dependencies change (the JVM, >>>>>> Scala, etc). In some cases, while not completely technically infeasible, >>>>>> the cost of maintaining a particular API can become too high. >>>>>> >> >> >>>>>> >> >> User Costs - APIs also have a cognitive cost to users learning >>>>>> Spark or trying to understand Spark programs. This cost becomes even >>>>>> higher >>>>>> when the API in question has confusing or undefined semantics. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Alternatives to Breaking an API >>>>>> >> >> >>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>> removal is also high, there are alternatives that should be considered >>>>>> that >>>>>> do not hurt existing users but do address some of the maintenance costs. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>> important point. Anytime we are adding a new interface to Spark we should >>>>>> consider that we might be stuck with this API forever. Think deeply about >>>>>> how new APIs relate to existing ones, as well as how you expect them to >>>>>> evolve over time. >>>>>> >> >> >>>>>> >> >> Deprecation Warnings - All deprecation warnings should point to >>>>>> a clear alternative and should never just say that an API is deprecated. >>>>>> >> >> >>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>> recommended way of performing a given task. In the cases where we >>>>>> maintain >>>>>> legacy documentation, we should clearly point to newer APIs and suggest >>>>>> to >>>>>> users the "right" way. >>>>>> >> >> >>>>>> >> >> Community Work - Many people learn Spark by reading blogs and >>>>>> other sites such as StackOverflow. However, many of these resources are >>>>>> out >>>>>> of date. Update them, to reduce the cost of eventually removing >>>>>> deprecated >>>>>> APIs. >>>>>> >> >> >>>>>> >> >> >>>>>> >> >> </new policy> >>>>>> >> >>>>>> >> >>>>>> --------------------------------------------------------------------- >>>>>> >> To unsubscribe e-mail: [email protected] >>>>>> >> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: [email protected] >>>>>> >>>>>> >>>>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> -- <https://databricks.com/sparkaisummit/north-america>
