+1 (non-binding) Cheers,
Xingbo On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote: > +1 (binding) > > Xiao > > On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> wrote: > >> +1 (non-binding) >> >> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> wrote: >> >>> The proposal itself seems good as the factors to consider, Thanks >>> Michael. >>> >>> Several concerns mentioned look good points, in particular: >>> >>> > ... assuming that this is for public stable APIs, not APIs that are >>> marked as unstable, evolving, etc. ... >>> I would like to confirm this. We already have API annotations such as >>> Experimental, Unstable, etc. and the implication of each is still >>> effective. If it's for stable APIs, it makes sense to me as well. >>> >>> > ... can we expand on 'when' an API change can occur ? Since we are >>> proposing to diverge from semver. ... >>> I think this is a good point. If we're proposing to divert from semver, >>> the delta compared to semver will have to be clarified to avoid different >>> personal interpretations of the somewhat general principles. >>> >>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>> Apache Spark 3.0+? ... >>> >>> Assuming these concerns will be addressed, +1 (binding). >>> >>> >>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 작성: >>> >>>> +1 (non-binding) >>>> >>>> Bests, >>>> Takeshi >>>> >>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>> gengliang.w...@databricks.com> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Gengliang >>>>> >>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com> >>>>> wrote: >>>>> >>>>>> +1 as well. >>>>>> >>>>>> Matei >>>>>> >>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> wrote: >>>>>> >>>>>> +1 (binding), assuming that this is for public stable APIs, not APIs >>>>>> that are marked as unstable, evolving, etc. >>>>>> >>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Michael's section on the trade-offs of maintaining / removing an API >>>>>>> are one of >>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>> >>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < >>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>> > >>>>>>> > This new policy has a good indention, but can we narrow down on >>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>> > >>>>>>> > I saw that there already exists a reverting PR to bring back Spark >>>>>>> 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>>> > >>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>> difficulty, and it's nice. >>>>>>> > >>>>>>> > However, for the other cases, it sounds like `recommending older >>>>>>> APIs as much as possible` due to the following. >>>>>>> > >>>>>>> > > How long has the API been in Spark? >>>>>>> > >>>>>>> > We had better be more careful when we add a new policy and should >>>>>>> aim not to mislead the users and 3rd party library developers to say >>>>>>> "older >>>>>>> is better". >>>>>>> > >>>>>>> > Technically, I'm wondering who will use new APIs in their examples >>>>>>> (of books and StackOverflow) if they need to write an additional warning >>>>>>> like `this only works at 2.4.0+` always . >>>>>>> > >>>>>>> > Bests, >>>>>>> > Dongjoon. >>>>>>> > >>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>>> mri...@gmail.com> wrote: >>>>>>> >> >>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>> prefer >>>>>>> >> stable well designed API's :-) >>>>>>> >> >>>>>>> >> Can we tie the proposal to stability guarantees given by spark and >>>>>>> >> reasonable expectation from users ? >>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>> >> experimental api which has been around for ages should be more >>>>>>> >> conservatively handled. >>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>> >> specified by annotations interacting with the proposal. >>>>>>> >> >>>>>>> >> Also, can we expand on 'when' an API change can occur ? Since we >>>>>>> are >>>>>>> >> proposing to diverge from semver. >>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>> 'impact' >>>>>>> >> of API ? Stability guarantees ? >>>>>>> >> >>>>>>> >> Regards, >>>>>>> >> Mridul >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>>> mich...@databricks.com> wrote: >>>>>>> >> > >>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>> >> > >>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>>> mich...@databricks.com> wrote: >>>>>>> >> >> >>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>> Versioning policy and adopt it as the rubric that should be used when >>>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As >>>>>>> this is a procedural vote, the measure will pass if there are more >>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but the >>>>>>> community is encouraged to add their voice to the discussion. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>> >> >> >>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> <new policy> >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> Considerations When Breaking APIs >>>>>>> >> >> >>>>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>>>> changing behavior, even at major versions. While this is not always >>>>>>> possible, the balance of the following factors should be considered >>>>>>> before >>>>>>> choosing to break an API. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> Cost of Breaking an API >>>>>>> >> >> >>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>>> users of Spark. A broken API means that Spark programs need to be >>>>>>> rewritten >>>>>>> before they can be upgraded. However, there are a few considerations >>>>>>> when >>>>>>> thinking about what the cost will be: >>>>>>> >> >> >>>>>>> >> >> Usage - an API that is actively used in many different places, >>>>>>> is always very costly to break. While it is hard to know usage for sure, >>>>>>> there are a bunch of ways that we can estimate: >>>>>>> >> >> >>>>>>> >> >> How long has the API been in Spark? >>>>>>> >> >> >>>>>>> >> >> Is the API common even for basic programs? >>>>>>> >> >> >>>>>>> >> >> How often do we see recent questions in JIRA or mailing lists? >>>>>>> >> >> >>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>> >> >> >>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>> today, work after the break? The following are listed roughly in order >>>>>>> of >>>>>>> increasing severity: >>>>>>> >> >> >>>>>>> >> >> Will there be a compiler or linker error? >>>>>>> >> >> >>>>>>> >> >> Will there be a runtime exception? >>>>>>> >> >> >>>>>>> >> >> Will that exception happen after significant processing has >>>>>>> been done? >>>>>>> >> >> >>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>> debug, might not even notice!) >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> Cost of Maintaining an API >>>>>>> >> >> >>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>> any APIs. We must also consider the cost both to the project and to our >>>>>>> users of keeping the API in question. >>>>>>> >> >> >>>>>>> >> >> Project Costs - Every API we have needs to be tested and needs >>>>>>> to keep working as other parts of the project changes. These costs are >>>>>>> significantly exacerbated when external dependencies change (the JVM, >>>>>>> Scala, etc). In some cases, while not completely technically infeasible, >>>>>>> the cost of maintaining a particular API can become too high. >>>>>>> >> >> >>>>>>> >> >> User Costs - APIs also have a cognitive cost to users learning >>>>>>> Spark or trying to understand Spark programs. This cost becomes even >>>>>>> higher >>>>>>> when the API in question has confusing or undefined semantics. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> Alternatives to Breaking an API >>>>>>> >> >> >>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>> removal is also high, there are alternatives that should be considered >>>>>>> that >>>>>>> do not hurt existing users but do address some of the maintenance costs. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>> should >>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>> about >>>>>>> how new APIs relate to existing ones, as well as how you expect them to >>>>>>> evolve over time. >>>>>>> >> >> >>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point >>>>>>> to a clear alternative and should never just say that an API is >>>>>>> deprecated. >>>>>>> >> >> >>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>> recommended way of performing a given task. In the cases where we >>>>>>> maintain >>>>>>> legacy documentation, we should clearly point to newer APIs and suggest >>>>>>> to >>>>>>> users the "right" way. >>>>>>> >> >> >>>>>>> >> >> Community Work - Many people learn Spark by reading blogs and >>>>>>> other sites such as StackOverflow. However, many of these resources are >>>>>>> out >>>>>>> of date. Update them, to reduce the cost of eventually removing >>>>>>> deprecated >>>>>>> APIs. >>>>>>> >> >> >>>>>>> >> >> >>>>>>> >> >> </new policy> >>>>>>> >> >>>>>>> >> >>>>>>> --------------------------------------------------------------------- >>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>> >>>> -- >>>> --- >>>> Takeshi Yamamuro >>>> >>> > > -- > <https://databricks.com/sparkaisummit/north-america> >