+1 (non-binding) Michael's section on the trade-offs of maintaining / removing an API are one of the best reads I have seeing in this mailing list. Enthusiast +1
On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > > This new policy has a good indention, but can we narrow down on the migration > from Apache Spark 2.4.5 to Apache Spark 3.0+? > > I saw that there already exists a reverting PR to bring back Spark 1.4 and > 1.5 APIs based on this AS-IS suggestion. > > The AS-IS policy is clearly mentioning that JVM/Scala-level difficulty, and > it's nice. > > However, for the other cases, it sounds like `recommending older APIs as much > as possible` due to the following. > > > How long has the API been in Spark? > > We had better be more careful when we add a new policy and should aim not to > mislead the users and 3rd party library developers to say "older is better". > > Technically, I'm wondering who will use new APIs in their examples (of books > and StackOverflow) if they need to write an additional warning like `this > only works at 2.4.0+` always . > > Bests, > Dongjoon. > > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <mri...@gmail.com> wrote: >> >> I am in broad agreement with the prposal, as any developer, I prefer >> stable well designed API's :-) >> >> Can we tie the proposal to stability guarantees given by spark and >> reasonable expectation from users ? >> In my opinion, an unstable or evolving could change - while an >> experimental api which has been around for ages should be more >> conservatively handled. >> Which brings in question what are the stability guarantees as >> specified by annotations interacting with the proposal. >> >> Also, can we expand on 'when' an API change can occur ? Since we are >> proposing to diverge from semver. >> Patch release ? Minor release ? Only major release ? Based on 'impact' >> of API ? Stability guarantees ? >> >> Regards, >> Mridul >> >> >> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <mich...@databricks.com> >> wrote: >> > >> > I'll start off the vote with a strong +1 (binding). >> > >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <mich...@databricks.com> >> > wrote: >> >> >> >> I propose to add the following text to Spark's Semantic Versioning policy >> >> and adopt it as the rubric that should be used when deciding to break >> >> APIs (even at major versions such as 3.0). >> >> >> >> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a >> >> procedural vote, the measure will pass if there are more favourable votes >> >> than unfavourable ones. PMC votes are binding, but the community is >> >> encouraged to add their voice to the discussion. >> >> >> >> >> >> [ ] +1 - Spark should adopt this policy. >> >> >> >> [ ] -1 - Spark should not adopt this policy. >> >> >> >> >> >> <new policy> >> >> >> >> >> >> Considerations When Breaking APIs >> >> >> >> The Spark project strives to avoid breaking APIs or silently changing >> >> behavior, even at major versions. While this is not always possible, the >> >> balance of the following factors should be considered before choosing to >> >> break an API. >> >> >> >> >> >> Cost of Breaking an API >> >> >> >> Breaking an API almost always has a non-trivial cost to the users of >> >> Spark. A broken API means that Spark programs need to be rewritten before >> >> they can be upgraded. However, there are a few considerations when >> >> thinking about what the cost will be: >> >> >> >> Usage - an API that is actively used in many different places, is always >> >> very costly to break. While it is hard to know usage for sure, there are >> >> a bunch of ways that we can estimate: >> >> >> >> How long has the API been in Spark? >> >> >> >> Is the API common even for basic programs? >> >> >> >> How often do we see recent questions in JIRA or mailing lists? >> >> >> >> How often does it appear in StackOverflow or blogs? >> >> >> >> Behavior after the break - How will a program that works today, work >> >> after the break? The following are listed roughly in order of increasing >> >> severity: >> >> >> >> Will there be a compiler or linker error? >> >> >> >> Will there be a runtime exception? >> >> >> >> Will that exception happen after significant processing has been done? >> >> >> >> Will we silently return different answers? (very hard to debug, might not >> >> even notice!) >> >> >> >> >> >> Cost of Maintaining an API >> >> >> >> Of course, the above does not mean that we will never break any APIs. We >> >> must also consider the cost both to the project and to our users of >> >> keeping the API in question. >> >> >> >> Project Costs - Every API we have needs to be tested and needs to keep >> >> working as other parts of the project changes. These costs are >> >> significantly exacerbated when external dependencies change (the JVM, >> >> Scala, etc). In some cases, while not completely technically infeasible, >> >> the cost of maintaining a particular API can become too high. >> >> >> >> User Costs - APIs also have a cognitive cost to users learning Spark or >> >> trying to understand Spark programs. This cost becomes even higher when >> >> the API in question has confusing or undefined semantics. >> >> >> >> >> >> Alternatives to Breaking an API >> >> >> >> In cases where there is a "Bad API", but where the cost of removal is >> >> also high, there are alternatives that should be considered that do not >> >> hurt existing users but do address some of the maintenance costs. >> >> >> >> >> >> Avoid Bad APIs - While this is a bit obvious, it is an important point. >> >> Anytime we are adding a new interface to Spark we should consider that we >> >> might be stuck with this API forever. Think deeply about how new APIs >> >> relate to existing ones, as well as how you expect them to evolve over >> >> time. >> >> >> >> Deprecation Warnings - All deprecation warnings should point to a clear >> >> alternative and should never just say that an API is deprecated. >> >> >> >> Updated Docs - Documentation should point to the "best" recommended way >> >> of performing a given task. In the cases where we maintain legacy >> >> documentation, we should clearly point to newer APIs and suggest to users >> >> the "right" way. >> >> >> >> Community Work - Many people learn Spark by reading blogs and other sites >> >> such as StackOverflow. However, many of these resources are out of date. >> >> Update them, to reduce the cost of eventually removing deprecated APIs. >> >> >> >> >> >> </new policy> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org