+1 (non-binding) I am disappointed however that this only mentions API and not dependencies and transitive dependencies.
As Spark does not provide separation between its runtime classpath and the classpath used by applications, I believe Spark's dependencies and transitive dependencies should be considered part of the API for this policy. Breaking dependency upgrades and incompatible dependency versions are the source of much frustration. michael > On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <ues...@happy-camper.st> wrote: > > +1 (binding) > > > On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <jiangxb1...@gmail.com > <mailto:jiangxb1...@gmail.com>> wrote: > +1 (non-binding) > > Cheers, > > Xingbo > > On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <lix...@databricks.com > <mailto:lix...@databricks.com>> wrote: > +1 (binding) > > Xiao > > On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <denny.g....@gmail.com > <mailto:denny.g....@gmail.com>> wrote: > +1 (non-binding) > > On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <gurwls...@gmail.com > <mailto:gurwls...@gmail.com>> wrote: > The proposal itself seems good as the factors to consider, Thanks Michael. > > Several concerns mentioned look good points, in particular: > > > ... assuming that this is for public stable APIs, not APIs that are marked > > as unstable, evolving, etc. ... > I would like to confirm this. We already have API annotations such as > Experimental, Unstable, etc. and the implication of each is still effective. > If it's for stable APIs, it makes sense to me as well. > > > ... can we expand on 'when' an API change can occur ? Since we are > > proposing to diverge from semver. ... > I think this is a good point. If we're proposing to divert from semver, the > delta compared to semver will have to be clarified to avoid different > personal interpretations of the somewhat general principles. > > > ... can we narrow down on the migration from Apache Spark 2.4.5 to Apache > > Spark 3.0+? ... > > Assuming these concerns will be addressed, +1 (binding). > > > 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <linguin....@gmail.com > <mailto:linguin....@gmail.com>>님이 작성: > +1 (non-binding) > > Bests, > Takeshi > > On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <gengliang.w...@databricks.com > <mailto:gengliang.w...@databricks.com>> wrote: > +1 (non-binding) > > Gengliang > > On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <matei.zaha...@gmail.com > <mailto:matei.zaha...@gmail.com>> wrote: > +1 as well. > > Matei > >> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <cloud0...@gmail.com >> <mailto:cloud0...@gmail.com>> wrote: >> >> +1 (binding), assuming that this is for public stable APIs, not APIs that >> are marked as unstable, evolving, etc. >> >> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <ieme...@gmail.com >> <mailto:ieme...@gmail.com>> wrote: >> +1 (non-binding) >> >> Michael's section on the trade-offs of maintaining / removing an API are one >> of >> the best reads I have seeing in this mailing list. Enthusiast +1 >> >> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com >> <mailto:dongjoon.h...@gmail.com>> wrote: >> > >> > This new policy has a good indention, but can we narrow down on the >> > migration from Apache Spark 2.4.5 to Apache Spark 3.0+? >> > >> > I saw that there already exists a reverting PR to bring back Spark 1.4 and >> > 1.5 APIs based on this AS-IS suggestion. >> > >> > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, >> > and it's nice. >> > >> > However, for the other cases, it sounds like `recommending older APIs as >> > much as possible` due to the following. >> > >> > > How long has the API been in Spark? >> > >> > We had better be more careful when we add a new policy and should aim not >> > to mislead the users and 3rd party library developers to say "older is >> > better". >> > >> > Technically, I'm wondering who will use new APIs in their examples (of >> > books and StackOverflow) if they need to write an additional warning like >> > `this only works at 2.4.0+` always . >> > >> > Bests, >> > Dongjoon. >> > >> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com >> > <mailto:mri...@gmail.com>> wrote: >> >> >> >> I am in broad agreement with the prposal, as any developer, I prefer >> >> stable well designed API's :-) >> >> >> >> Can we tie the proposal to stability guarantees given by spark and >> >> reasonable expectation from users ? >> >> In my opinion, an unstable or evolving could change - while an >> >> experimental api which has been around for ages should be more >> >> conservatively handled. >> >> Which brings in question what are the stability guarantees as >> >> specified by annotations interacting with the proposal. >> >> >> >> Also, can we expand on 'when' an API change can occur ? Since we are >> >> proposing to diverge from semver. >> >> Patch release ? Minor release ? Only major release ? Based on 'impact' >> >> of API ? Stability guarantees ? >> >> >> >> Regards, >> >> Mridul >> >> >> >> >> >> >> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <mich...@databricks.com >> >> <mailto:mich...@databricks.com>> wrote: >> >> > >> >> > I'll start off the vote with a strong +1 (binding). >> >> > >> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com >> >> > <mailto:mich...@databricks.com>> wrote: >> >> >> >> >> >> I propose to add the following text to Spark's Semantic Versioning >> >> >> policy and adopt it as the rubric that should be used when deciding to >> >> >> break APIs (even at major versions such as 3.0). >> >> >> >> >> >> >> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is >> >> >> a procedural vote, the measure will pass if there are more favourable >> >> >> votes than unfavourable ones. PMC votes are binding, but the community >> >> >> is encouraged to add their voice to the discussion. >> >> >> >> >> >> >> >> >> [ ] +1 - Spark should adopt this policy. >> >> >> >> >> >> [ ] -1 - Spark should not adopt this policy. >> >> >> >> >> >> >> >> >> <new policy> >> >> >> >> >> >> >> >> >> Considerations When Breaking APIs >> >> >> >> >> >> The Spark project strives to avoid breaking APIs or silently changing >> >> >> behavior, even at major versions. While this is not always possible, >> >> >> the balance of the following factors should be considered before >> >> >> choosing to break an API. >> >> >> >> >> >> >> >> >> Cost of Breaking an API >> >> >> >> >> >> Breaking an API almost always has a non-trivial cost to the users of >> >> >> Spark. A broken API means that Spark programs need to be rewritten >> >> >> before they can be upgraded. However, there are a few considerations >> >> >> when thinking about what the cost will be: >> >> >> >> >> >> Usage - an API that is actively used in many different places, is >> >> >> always very costly to break. While it is hard to know usage for sure, >> >> >> there are a bunch of ways that we can estimate: >> >> >> >> >> >> How long has the API been in Spark? >> >> >> >> >> >> Is the API common even for basic programs? >> >> >> >> >> >> How often do we see recent questions in JIRA or mailing lists? >> >> >> >> >> >> How often does it appear in StackOverflow or blogs? >> >> >> >> >> >> Behavior after the break - How will a program that works today, work >> >> >> after the break? The following are listed roughly in order of >> >> >> increasing severity: >> >> >> >> >> >> Will there be a compiler or linker error? >> >> >> >> >> >> Will there be a runtime exception? >> >> >> >> >> >> Will that exception happen after significant processing has been done? >> >> >> >> >> >> Will we silently return different answers? (very hard to debug, might >> >> >> not even notice!) >> >> >> >> >> >> >> >> >> Cost of Maintaining an API >> >> >> >> >> >> Of course, the above does not mean that we will never break any APIs. >> >> >> We must also consider the cost both to the project and to our users of >> >> >> keeping the API in question. >> >> >> >> >> >> Project Costs - Every API we have needs to be tested and needs to keep >> >> >> working as other parts of the project changes. These costs are >> >> >> significantly exacerbated when external dependencies change (the JVM, >> >> >> Scala, etc). In some cases, while not completely technically >> >> >> infeasible, the cost of maintaining a particular API can become too >> >> >> high. >> >> >> >> >> >> User Costs - APIs also have a cognitive cost to users learning Spark >> >> >> or trying to understand Spark programs. This cost becomes even higher >> >> >> when the API in question has confusing or undefined semantics. >> >> >> >> >> >> >> >> >> Alternatives to Breaking an API >> >> >> >> >> >> In cases where there is a "Bad API", but where the cost of removal is >> >> >> also high, there are alternatives that should be considered that do >> >> >> not hurt existing users but do address some of the maintenance costs. >> >> >> >> >> >> >> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important >> >> >> point. Anytime we are adding a new interface to Spark we should >> >> >> consider that we might be stuck with this API forever. Think deeply >> >> >> about how new APIs relate to existing ones, as well as how you expect >> >> >> them to evolve over time. >> >> >> >> >> >> Deprecation Warnings - All deprecation warnings should point to a >> >> >> clear alternative and should never just say that an API is deprecated. >> >> >> >> >> >> Updated Docs - Documentation should point to the "best" recommended >> >> >> way of performing a given task. In the cases where we maintain legacy >> >> >> documentation, we should clearly point to newer APIs and suggest to >> >> >> users the "right" way. >> >> >> >> >> >> Community Work - Many people learn Spark by reading blogs and other >> >> >> sites such as StackOverflow. However, many of these resources are out >> >> >> of date. Update them, to reduce the cost of eventually removing >> >> >> deprecated APIs. >> >> >> >> >> >> >> >> >> </new policy> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> <mailto:dev-unsubscr...@spark.apache.org> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> <mailto:dev-unsubscr...@spark.apache.org> >> > > > > -- > --- > Takeshi Yamamuro > > > -- > <https://databricks.com/sparkaisummit/north-america> > > -- > Takuya UESHIN > > http://twitter.com/ueshin <http://twitter.com/ueshin>