+1 On Mon, Mar 09, 2020 at 3:53 PM, John Zhuge < jzh...@apache.org > wrote:
> > +1 (non-binding) > > > On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer < heuermh@ gmail. com ( > heue...@gmail.com ) > wrote: > > >> +1 (non-binding) >> >> >> I am disappointed however that this only mentions API and not dependencies >> and transitive dependencies. >> >> >> As Spark does not provide separation between its runtime classpath and the >> classpath used by applications, I believe Spark's dependencies and >> transitive dependencies should be considered part of the API for this >> policy. Breaking dependency upgrades and incompatible dependency versions >> are the source of much frustration. >> >> >> michael >> >> >> >> >>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN < ueshin@ happy-camper. st ( >>> ues...@happy-camper.st ) > wrote: >>> >>> +1 (binding) >>> >>> >>> >>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang < jiangxb1987@ gmail. com ( >>> jiangxb1...@gmail.com ) > wrote: >>> >>> >>>> +1 (non-binding) >>>> >>>> >>>> Cheers, >>>> >>>> >>>> Xingbo >>>> >>>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li < lixiao@ databricks. com ( >>>> lix...@databricks.com ) > wrote: >>>> >>>> >>>>> +1 (binding) >>>>> >>>>> >>>>> Xiao >>>>> >>>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee < denny. g. lee@ gmail. com ( >>>>> denny.g....@gmail.com ) > wrote: >>>>> >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> >>>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon < gurwls223@ gmail. com ( >>>>>> gurwls...@gmail.com ) > wrote: >>>>>> >>>>>> >>>>>>> The proposal itself seems good as the factors to consider, Thanks >>>>>>> Michael. >>>>>>> >>>>>>> >>>>>>> Several concerns mentioned look good points, in particular: >>>>>>> >>>>>>> > ... assuming that this is for public stable APIs, not APIs that are >>>>>>> marked as unstable, evolving, etc. ... >>>>>>> I would like to confirm this. We already have API annotations such as >>>>>>> Experimental, Unstable, etc. and the implication of each is still >>>>>>> effective. If it's for stable APIs, it makes sense to me as well. >>>>>>> >>>>>>> > ... can we expand on 'when' an API change can occur ? Since we are >>>>>>> proposing to diverge from semver. ... >>>>>>> >>>>>>> I think this is a good point. If we're proposing to divert from semver, >>>>>>> the delta compared to semver will have to be clarified to avoid >>>>>>> different >>>>>>> personal interpretations of the somewhat general principles. >>>>>>> >>>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to >>>>>>> Apache Spark 3.0+? ... >>>>>>> >>>>>>> Assuming these concerns will be addressed, +1 (binding). >>>>>>> >>>>>>> >>>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro < linguin. m. s@ gmail. com ( >>>>>>> linguin....@gmail.com ) >님이 작성: >>>>>>> >>>>>>> >>>>>>>> +1 (non-binding) >>>>>>>> >>>>>>>> >>>>>>>> Bests, >>>>>>>> Takeshi >>>>>>>> >>>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang < gengliang. wang@ >>>>>>>> databricks. >>>>>>>> com ( gengliang.w...@databricks.com ) > wrote: >>>>>>>> >>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> >>>>>>>>> Gengliang >>>>>>>>> >>>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia < matei. zaharia@ >>>>>>>>> gmail. com >>>>>>>>> ( matei.zaha...@gmail.com ) > wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> +1 as well. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Matei >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan < cloud0fan@ gmail. com ( >>>>>>>>>>> cloud0...@gmail.com ) > wrote: >>>>>>>>>>> >>>>>>>>>>> +1 (binding), assuming that this is for public stable APIs, not >>>>>>>>>>> APIs that >>>>>>>>>>> are marked as unstable, evolving, etc. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía < iemejia@ gmail. com ( >>>>>>>>>>> ieme...@gmail.com ) > wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> +1 (non-binding) >>>>>>>>>>>> >>>>>>>>>>>> Michael's section on the trade-offs of maintaining / removing an >>>>>>>>>>>> API are >>>>>>>>>>>> one of >>>>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1 >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun < dongjoon. hyun@ >>>>>>>>>>>> gmail. com ( >>>>>>>>>>>> dongjoon.h...@gmail.com ) > wrote: >>>>>>>>>>>> > >>>>>>>>>>>> > This new policy has a good indention, but can we narrow down on >>>>>>>>>>>> > the >>>>>>>>>>>> migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >>>>>>>>>>>> > >>>>>>>>>>>> > I saw that there already exists a reverting PR to bring back >>>>>>>>>>>> > Spark 1.4 >>>>>>>>>>>> and 1.5 APIs based on this AS-IS suggestion. >>>>>>>>>>>> > >>>>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level >>>>>>>>>>>> > difficulty, >>>>>>>>>>>> and it's nice. >>>>>>>>>>>> > >>>>>>>>>>>> > However, for the other cases, it sounds like `recommending older >>>>>>>>>>>> > APIs as >>>>>>>>>>>> much as possible` due to the following. >>>>>>>>>>>> > >>>>>>>>>>>> > > How long has the API been in Spark? >>>>>>>>>>>> > >>>>>>>>>>>> > We had better be more careful when we add a new policy and >>>>>>>>>>>> > should aim >>>>>>>>>>>> not to mislead the users and 3rd party library developers to say >>>>>>>>>>>> "older is >>>>>>>>>>>> better". >>>>>>>>>>>> > >>>>>>>>>>>> > Technically, I'm wondering who will use new APIs in their >>>>>>>>>>>> > examples (of >>>>>>>>>>>> books and StackOverflow) if they need to write an additional >>>>>>>>>>>> warning like >>>>>>>>>>>> `this only works at 2.4.0+` always . >>>>>>>>>>>> > >>>>>>>>>>>> > Bests, >>>>>>>>>>>> > Dongjoon. >>>>>>>>>>>> > >>>>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan < mridul@ >>>>>>>>>>>> > gmail. com ( >>>>>>>>>>>> mri...@gmail.com ) > wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I >>>>>>>>>>>> >> prefer >>>>>>>>>>>> >> stable well designed API's :-) >>>>>>>>>>>> >> >>>>>>>>>>>> >> Can we tie the proposal to stability guarantees given by spark >>>>>>>>>>>> >> and >>>>>>>>>>>> >> reasonable expectation from users ? >>>>>>>>>>>> >> In my opinion, an unstable or evolving could change - while an >>>>>>>>>>>> >> experimental api which has been around for ages should be more >>>>>>>>>>>> >> conservatively handled. >>>>>>>>>>>> >> Which brings in question what are the stability guarantees as >>>>>>>>>>>> >> specified by annotations interacting with the proposal. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Also, can we expand on 'when' an API change can occur ? Since >>>>>>>>>>>> >> we are >>>>>>>>>>>> >> proposing to diverge from semver. >>>>>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on >>>>>>>>>>>> >> 'impact' >>>>>>>>>>>> >> of API ? Stability guarantees ? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Regards, >>>>>>>>>>>> >> Mridul >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust < michael@ >>>>>>>>>>>> >> databricks. com >>>>>>>>>>>> ( mich...@databricks.com ) > wrote: >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > I'll start off the vote with a strong +1 (binding). >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust < michael@ >>>>>>>>>>>> >> > databricks. >>>>>>>>>>>> com ( mich...@databricks.com ) > wrote: >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> I propose to add the following text to Spark's Semantic >>>>>>>>>>>> >> >> Versioning >>>>>>>>>>>> policy and adopt it as the rubric that should be used when >>>>>>>>>>>> deciding to >>>>>>>>>>>> break APIs (even at major versions such as 3.0). >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. >>>>>>>>>>>> >> >> As this >>>>>>>>>>>> is a procedural vote, the measure will pass if there are more >>>>>>>>>>>> favourable >>>>>>>>>>>> votes than unfavourable ones. PMC votes are binding, but the >>>>>>>>>>>> community is >>>>>>>>>>>> encouraged to add their voice to the discussion. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> [ ] -1 - Spark should not adopt this policy. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> <new policy> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Considerations When Breaking APIs >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or silently >>>>>>>>>>>> changing behavior, even at major versions. While this is not always >>>>>>>>>>>> possible, the balance of the following factors should be >>>>>>>>>>>> considered before >>>>>>>>>>>> choosing to break an API. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Cost of Breaking an API >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the >>>>>>>>>>>> >> >> users of >>>>>>>>>>>> Spark. A broken API means that Spark programs need to be rewritten >>>>>>>>>>>> before >>>>>>>>>>>> they can be upgraded. However, there are a few considerations when >>>>>>>>>>>> thinking about what the cost will be: >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Usage - an API that is actively used in many different >>>>>>>>>>>> >> >> places, is >>>>>>>>>>>> always very costly to break. While it is hard to know usage for >>>>>>>>>>>> sure, >>>>>>>>>>>> there are a bunch of ways that we can estimate: >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> How long has the API been in Spark? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Is the API common even for basic programs? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing >>>>>>>>>>>> >> >> lists? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> How often does it appear in StackOverflow or blogs? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Behavior after the break - How will a program that works >>>>>>>>>>>> >> >> today, work >>>>>>>>>>>> after the break? The following are listed roughly in order of >>>>>>>>>>>> increasing >>>>>>>>>>>> severity: >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Will there be a compiler or linker error? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Will there be a runtime exception? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Will that exception happen after significant processing has >>>>>>>>>>>> >> >> been >>>>>>>>>>>> done? >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Will we silently return different answers? (very hard to >>>>>>>>>>>> >> >> debug, >>>>>>>>>>>> might not even notice!) >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Cost of Maintaining an API >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Of course, the above does not mean that we will never break >>>>>>>>>>>> >> >> any >>>>>>>>>>>> APIs. We must also consider the cost both to the project and to >>>>>>>>>>>> our users >>>>>>>>>>>> of keeping the API in question. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and >>>>>>>>>>>> >> >> needs to >>>>>>>>>>>> keep working as other parts of the project changes. These costs are >>>>>>>>>>>> significantly exacerbated when external dependencies change (the >>>>>>>>>>>> JVM, >>>>>>>>>>>> Scala, etc). In some cases, while not completely technically >>>>>>>>>>>> infeasible, >>>>>>>>>>>> the cost of maintaining a particular API can become too high. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users >>>>>>>>>>>> >> >> learning Spark >>>>>>>>>>>> or trying to understand Spark programs. This cost becomes even >>>>>>>>>>>> higher when >>>>>>>>>>>> the API in question has confusing or undefined semantics. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Alternatives to Breaking an API >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of >>>>>>>>>>>> >> >> removal >>>>>>>>>>>> is also high, there are alternatives that should be considered >>>>>>>>>>>> that do not >>>>>>>>>>>> hurt existing users but do address some of the maintenance costs. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an >>>>>>>>>>>> >> >> important >>>>>>>>>>>> point. Anytime we are adding a new interface to Spark we should >>>>>>>>>>>> consider >>>>>>>>>>>> that we might be stuck with this API forever. Think deeply about >>>>>>>>>>>> how new >>>>>>>>>>>> APIs relate to existing ones, as well as how you expect them to >>>>>>>>>>>> evolve >>>>>>>>>>>> over time. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should point >>>>>>>>>>>> >> >> to a >>>>>>>>>>>> clear alternative and should never just say that an API is >>>>>>>>>>>> deprecated. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Updated Docs - Documentation should point to the "best" >>>>>>>>>>>> >> >> recommended >>>>>>>>>>>> way of performing a given task. In the cases where we maintain >>>>>>>>>>>> legacy >>>>>>>>>>>> documentation, we should clearly point to newer APIs and suggest >>>>>>>>>>>> to users >>>>>>>>>>>> the "right" way. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs >>>>>>>>>>>> >> >> and other >>>>>>>>>>>> sites such as StackOverflow. However, many of these resources are >>>>>>>>>>>> out of >>>>>>>>>>>> date. Update them, to reduce the cost of eventually removing >>>>>>>>>>>> deprecated >>>>>>>>>>>> APIs. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> </new policy> >>>>>>>>>>>> >> >>>>>>>>>>>> >> --------------------------------------------------------------------- >>>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscribe@ spark. apache. org ( >>>>>>>>>>>> dev-unsubscr...@spark.apache.org ) >>>>>>>>>>>> >> >>>>>>>>>>>> >>>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@ spark. apache. org ( >>>>>>>>>>>> dev-unsubscr...@spark.apache.org ) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> --- >>>>>>>> Takeshi Yamamuro >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ( https://databricks.com/sparkaisummit/north-america ) >>>>> >>>> >>>> >>> >>> >>> >>> >>> -- >>> Takuya UESHIN >>> >>> http:/ / twitter. com/ ueshin ( http://twitter.com/ueshin ) >>> >> >> >> >> > > > > > -- > John Zhuge >
smime.p7s
Description: S/MIME Cryptographic Signature