+1 (binding) I also assume that the implementation of the proposal will be executed carefully case-by-case via enough open discussions.
Thanks, Dongjoon. On Mon, Mar 9, 2020 at 5:20 PM Holden Karau <hol...@pigscanfly.ca> wrote: > +1 (binding) on the original proposal. > > On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <heue...@gmail.com> wrote: > >> +1 (non-binding) >> >> I am disappointed however that this only mentions API and not >> dependencies and transitive dependencies. >> > I think upgrading dependencies continues to be reasonable. > >> >> As Spark does not provide separation between its runtime classpath and >> the classpath used by applications, I believe Spark's dependencies and >> transitive dependencies should be considered part of the API for this >> policy. Breaking dependency upgrades and incompatible dependency versions >> are the source of much frustration. >> > I my self have also face this frustration. I believe we've increased some > shading to help here. Are there specific pain points you've experienced? > Maybe we can factor this discussion into another thread > >> >> > >> michael >> >> >> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ues...@happy-camper.st> wrote: >> >> +1 (binding) >> >> >> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com> >> wrote: >> >>> +1 (non-binding) >>> >>> Cheers, >>> >>> Xingbo >>> >>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com> wrote: >>> >>>> +1 (binding) >>>> >>>> Xiao >>>> >>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com> >>>>> wrote: >>>>> >>>>>> The proposal itself seems good as the factors to consider, Thanks >>>>>> Michael. >>>>>> >>>>>> Several concerns mentioned look good points, in particular: >>>>>> >>>>>> > ... assuming that this is for public stable APIs, not APIs that are >>>>>> marked as unstable, evolving, etc. ... >>>>>> I would like to confirm this. We already have API annotations such as >>>>>> Experimental, Unstable, etc. and the implication of each is still >>>>>> effective. If it's for stable APIs, it makes sense to me as well. >>>>>> >>>>>> > ... can we expand on 'when' an API change can occur ? Since we are >>>>>> proposing to diverge from semver. ... >>>>>> I think this is a good point. If we're proposing to divert >>>>>> from semver, the delta compared to semver will have to be clarified to >>>>>> avoid different personal interpretations of the somewhat general >>>>>> principles. >>>>>> >>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>>>>> Apache Spark 3.0+? ... >>>>>> >>>>>> Assuming these concerns will be addressed, +1 (binding). >>>>>> >>>>>> >>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com>님이 >>>>>> 작성: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Bests, >>>>>>> Takeshi >>>>>>> >>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < >>>>>>> gengliang.w...@databricks.com> wrote: >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> Gengliang >>>>>>>> >>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < >>>>>>>> matei.zaha...@gmail.com> wrote: >>>>>>>> >>>>>>>>> +1 as well. >>>>>>>>> >>>>>>>>> Matei >>>>>>>>> >>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> +1 (binding), assuming that this is for public stable APIs, not >>>>>>>>> APIs that are marked as unstable, evolving, etc. >>>>>>>>> >>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 (non-binding) >>>>>>>>>> >>>>>>>>>> Michael's section on the trade-offs of maintaining / removing an >>>>>>>>>> API are one of >>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>>>>> >>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < >>>>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>>>> > >>>>>>>>>> > This new policy has a good indention, but can we narrow down on >>>>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>>>>> > >>>>>>>>>> > I saw that there already exists a reverting PR to bring back >>>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion. >>>>>>>>>> > >>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>>>>> difficulty, and it's nice. >>>>>>>>>> > >>>>>>>>>> > However, for the other cases, it sounds like `recommending >>>>>>>>>> older APIs as much as possible` due to the following. >>>>>>>>>> > >>>>>>>>>> > > How long has the API been in Spark? >>>>>>>>>> > >>>>>>>>>> > We had better be more careful when we add a new policy and >>>>>>>>>> should aim not to mislead the users and 3rd party library developers >>>>>>>>>> to say >>>>>>>>>> "older is better". >>>>>>>>>> > >>>>>>>>>> > Technically, I'm wondering who will use new APIs in their >>>>>>>>>> examples (of books and StackOverflow) if they need to write an >>>>>>>>>> additional >>>>>>>>>> warning like `this only works at 2.4.0+` always . >>>>>>>>>> > >>>>>>>>>> > Bests, >>>>>>>>>> > Dongjoon. >>>>>>>>>> > >>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < >>>>>>>>>> mri...@gmail.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>>>>> prefer >>>>>>>>>> >> stable well designed API's :-) >>>>>>>>>> >> >>>>>>>>>> >> Can we tie the proposal to stability guarantees given by spark >>>>>>>>>> and >>>>>>>>>> >> reasonable expectation from users ? >>>>>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>>>>> >> experimental api which has been around for ages should be more >>>>>>>>>> >> conservatively handled. >>>>>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>>>>> >> specified by annotations interacting with the proposal. >>>>>>>>>> >> >>>>>>>>>> >> Also, can we expand on 'when' an API change can occur ? Since >>>>>>>>>> we are >>>>>>>>>> >> proposing to diverge from semver. >>>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>>>>> 'impact' >>>>>>>>>> >> of API ? Stability guarantees ? >>>>>>>>>> >> >>>>>>>>>> >> Regards, >>>>>>>>>> >> Mridul >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < >>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>> >> > >>>>>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>>>>> >> > >>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < >>>>>>>>>> mich...@databricks.com> wrote: >>>>>>>>>> >> >> >>>>>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>>>>> Versioning policy and adopt it as the rubric that should be used when >>>>>>>>>> deciding to break APIs (even at major versions such as 3.0). >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. >>>>>>>>>> As this is a procedural vote, the measure will pass if there are more >>>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but >>>>>>>>>> the >>>>>>>>>> community is encouraged to add their voice to the discussion. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>>>>> >> >> >>>>>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> <new policy> >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> Considerations When Breaking APIs >>>>>>>>>> >> >> >>>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or >>>>>>>>>> silently changing behavior, even at major versions. While this is not >>>>>>>>>> always possible, the balance of the following factors should be >>>>>>>>>> considered >>>>>>>>>> before choosing to break an API. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> Cost of Breaking an API >>>>>>>>>> >> >> >>>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>>>>>> users of Spark. A broken API means that Spark programs need to be >>>>>>>>>> rewritten >>>>>>>>>> before they can be upgraded. However, there are a few considerations >>>>>>>>>> when >>>>>>>>>> thinking about what the cost will be: >>>>>>>>>> >> >> >>>>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>>>> places, is always very costly to break. While it is hard to know >>>>>>>>>> usage for >>>>>>>>>> sure, there are a bunch of ways that we can estimate: >>>>>>>>>> >> >> >>>>>>>>>> >> >> How long has the API been in Spark? >>>>>>>>>> >> >> >>>>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>>>> >> >> >>>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing >>>>>>>>>> lists? >>>>>>>>>> >> >> >>>>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>>>> >> >> >>>>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>>>> today, work after the break? The following are listed roughly in >>>>>>>>>> order of >>>>>>>>>> increasing severity: >>>>>>>>>> >> >> >>>>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>>>> >> >> >>>>>>>>>> >> >> Will there be a runtime exception? >>>>>>>>>> >> >> >>>>>>>>>> >> >> Will that exception happen after significant processing has >>>>>>>>>> been done? >>>>>>>>>> >> >> >>>>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>>>> debug, might not even notice!) >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> Cost of Maintaining an API >>>>>>>>>> >> >> >>>>>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>>>>> any APIs. We must also consider the cost both to the project and to >>>>>>>>>> our >>>>>>>>>> users of keeping the API in question. >>>>>>>>>> >> >> >>>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>>>> needs to keep working as other parts of the project changes. These >>>>>>>>>> costs >>>>>>>>>> are significantly exacerbated when external dependencies change (the >>>>>>>>>> JVM, >>>>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>>>> infeasible, >>>>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>>>> >> >> >>>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>>>> learning Spark or trying to understand Spark programs. This cost >>>>>>>>>> becomes >>>>>>>>>> even higher when the API in question has confusing or undefined >>>>>>>>>> semantics. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>>>> >> >> >>>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>>>> removal is also high, there are alternatives that should be >>>>>>>>>> considered that >>>>>>>>>> do not hurt existing users but do address some of the maintenance >>>>>>>>>> costs. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>>>> important point. Anytime we are adding a new interface to Spark we >>>>>>>>>> should >>>>>>>>>> consider that we might be stuck with this API forever. Think deeply >>>>>>>>>> about >>>>>>>>>> how new APIs relate to existing ones, as well as how you expect them >>>>>>>>>> to >>>>>>>>>> evolve over time. >>>>>>>>>> >> >> >>>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should >>>>>>>>>> point to a clear alternative and should never just say that an API is >>>>>>>>>> deprecated. >>>>>>>>>> >> >> >>>>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>>>> recommended way of performing a given task. In the cases where we >>>>>>>>>> maintain >>>>>>>>>> legacy documentation, we should clearly point to newer APIs and >>>>>>>>>> suggest to >>>>>>>>>> users the "right" way. >>>>>>>>>> >> >> >>>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs >>>>>>>>>> and other sites such as StackOverflow. However, many of these >>>>>>>>>> resources are >>>>>>>>>> out of date. Update them, to reduce the cost of eventually removing >>>>>>>>>> deprecated APIs. >>>>>>>>>> >> >> >>>>>>>>>> >> >> >>>>>>>>>> >> >> </new policy> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>> >> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> --- >>>>>>> Takeshi Yamamuro >>>>>>> >>>>>> >>>> >>>> -- >>>> <https://databricks.com/sparkaisummit/north-america> >>>> >>> >> >> -- >> Takuya UESHIN >> >> http://twitter.com/ueshin >> >> >> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >