Re: [VOTE] Amend Spark's Semantic Versioning Policy

Dongjoon Hyun Tue, 10 Mar 2020 00:56:03 -0700

+1 (binding)

I also assume that the implementation of the proposal will be executed
carefully case-by-case via enough open discussions.


Thanks,
Dongjoon.

On Mon, Mar 9, 2020 at 5:20 PM Holden Karau <[email protected]> wrote:

> +1 (binding) on the original proposal.
>
> On Mon, Mar 9, 2020 at 1:32 PM Michael Heuer <[email protected]> wrote:
>
>> +1 (non-binding)
>>
>> I am disappointed however that this only mentions API and not
>> dependencies and transitive dependencies.
>>
> I think upgrading dependencies continues to be reasonable.
>
>>
>> As Spark does not provide separation between its runtime classpath and
>> the classpath used by applications, I believe Spark's dependencies and
>> transitive dependencies should be considered part of the API for this
>> policy.  Breaking dependency upgrades and incompatible dependency versions
>> are the source of much frustration.
>>
> I my self have also face this frustration. I believe we've increased some
> shading to help here. Are there specific pain points you've  experienced?
> Maybe we can factor this discussion into another thread
>
>>
>>
>
>>    michael
>>
>>
>> On Mar 9, 2020, at 2:16 PM, Takuya UESHIN <[email protected]> wrote:
>>
>> +1 (binding)
>>
>>
>> On Mon, Mar 9, 2020 at 11:49 AM Xingbo Jiang <[email protected]>
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Cheers,
>>>
>>> Xingbo
>>>
>>> On Mon, Mar 9, 2020 at 9:35 AM Xiao Li <[email protected]> wrote:
>>>
>>>> +1 (binding)
>>>>
>>>> Xiao
>>>>
>>>> On Mon, Mar 9, 2020 at 8:33 AM Denny Lee <[email protected]> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> The proposal itself seems good as the factors to consider, Thanks
>>>>>> Michael.
>>>>>>
>>>>>> Several concerns mentioned look good points, in particular:
>>>>>>
>>>>>> > ... assuming that this is for public stable APIs, not APIs that are
>>>>>> marked as unstable, evolving, etc. ...
>>>>>> I would like to confirm this. We already have API annotations such as
>>>>>> Experimental, Unstable, etc. and the implication of each is still
>>>>>> effective. If it's for stable APIs, it makes sense to me as well.
>>>>>>
>>>>>> > ... can we expand on 'when' an API change can occur ?  Since we are
>>>>>> proposing to diverge from semver. ...
>>>>>> I think this is a good point. If we're proposing to divert
>>>>>> from semver, the delta compared to semver will have to be clarified to
>>>>>> avoid different personal interpretations of the somewhat general 
>>>>>> principles.
>>>>>>
>>>>>> > ... can we narrow down on the migration from Apache Spark 2.4.5 to
>>>>>> Apache Spark 3.0+? ...
>>>>>>
>>>>>> Assuming these concerns will be addressed, +1 (binding).
>>>>>>
>>>>>>
>>>>>> 2020년 3월 9일 (월) 오후 4:53, Takeshi Yamamuro <[email protected]>님이
>>>>>> 작성:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Bests,
>>>>>>> Takeshi
>>>>>>>
>>>>>>> On Mon, Mar 9, 2020 at 4:52 PM Gengliang Wang <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Gengliang
>>>>>>>>
>>>>>>>> On Mon, Mar 9, 2020 at 12:22 AM Matei Zaharia <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> +1 as well.
>>>>>>>>>
>>>>>>>>> Matei
>>>>>>>>>
>>>>>>>>> On Mar 9, 2020, at 12:05 AM, Wenchen Fan <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +1 (binding), assuming that this is for public stable APIs, not
>>>>>>>>> APIs that are marked as unstable, evolving, etc.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 9, 2020 at 1:10 AM Ismaël Mejía <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>
>>>>>>>>>> Michael's section on the trade-offs of maintaining / removing an
>>>>>>>>>> API are one of
>>>>>>>>>> the best reads I have seeing in this mailing list. Enthusiast +1
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 7, 2020 at 8:28 PM Dongjoon Hyun <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >
>>>>>>>>>> > This new policy has a good indention, but can we narrow down on
>>>>>>>>>> the migration from Apache Spark 2.4.5 to Apache Spark 3.0+?
>>>>>>>>>> >
>>>>>>>>>> > I saw that there already exists a reverting PR to bring back
>>>>>>>>>> Spark 1.4 and 1.5 APIs based on this AS-IS suggestion.
>>>>>>>>>> >
>>>>>>>>>> > The AS-IS policy is clearly mentioning that JVM/Scala-level
>>>>>>>>>> difficulty, and it's nice.
>>>>>>>>>> >
>>>>>>>>>> > However, for the other cases, it sounds like `recommending
>>>>>>>>>> older APIs as much as possible` due to the following.
>>>>>>>>>> >
>>>>>>>>>> >      > How long has the API been in Spark?
>>>>>>>>>> >
>>>>>>>>>> > We had better be more careful when we add a new policy and
>>>>>>>>>> should aim not to mislead the users and 3rd party library developers 
>>>>>>>>>> to say
>>>>>>>>>> "older is better".
>>>>>>>>>> >
>>>>>>>>>> > Technically, I'm wondering who will use new APIs in their
>>>>>>>>>> examples (of books and StackOverflow) if they need to write an 
>>>>>>>>>> additional
>>>>>>>>>> warning like `this only works at 2.4.0+` always .
>>>>>>>>>> >
>>>>>>>>>> > Bests,
>>>>>>>>>> > Dongjoon.
>>>>>>>>>> >
>>>>>>>>>> > On Fri, Mar 6, 2020 at 7:10 PM Mridul Muralidharan <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> I am in broad agreement with the prposal, as any developer, I
>>>>>>>>>> prefer
>>>>>>>>>> >> stable well designed API's :-)
>>>>>>>>>> >>
>>>>>>>>>> >> Can we tie the proposal to stability guarantees given by spark
>>>>>>>>>> and
>>>>>>>>>> >> reasonable expectation from users ?
>>>>>>>>>> >> In my opinion, an unstable or evolving could change - while an
>>>>>>>>>> >> experimental api which has been around for ages should be more
>>>>>>>>>> >> conservatively handled.
>>>>>>>>>> >> Which brings in question what are the stability guarantees as
>>>>>>>>>> >> specified by annotations interacting with the proposal.
>>>>>>>>>> >>
>>>>>>>>>> >> Also, can we expand on 'when' an API change can occur ?  Since
>>>>>>>>>> we are
>>>>>>>>>> >> proposing to diverge from semver.
>>>>>>>>>> >> Patch release ? Minor release ? Only major release ? Based on
>>>>>>>>>> 'impact'
>>>>>>>>>> >> of API ? Stability guarantees ?
>>>>>>>>>> >>
>>>>>>>>>> >> Regards,
>>>>>>>>>> >> Mridul
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Fri, Mar 6, 2020 at 7:01 PM Michael Armbrust <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >> >
>>>>>>>>>> >> > I'll start off the vote with a strong +1 (binding).
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Fri, Mar 6, 2020 at 1:01 PM Michael Armbrust <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> I propose to add the following text to Spark's Semantic
>>>>>>>>>> Versioning policy and adopt it as the rubric that should be used when
>>>>>>>>>> deciding to break APIs (even at major versions such as 3.0).
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> I'll leave the vote open until Tuesday, March 10th at 2pm.
>>>>>>>>>> As this is a procedural vote, the measure will pass if there are more
>>>>>>>>>> favourable votes than unfavourable ones. PMC votes are binding, but 
>>>>>>>>>> the
>>>>>>>>>> community is encouraged to add their voice to the discussion.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> [ ] +1 - Spark should adopt this policy.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> [ ] -1  - Spark should not adopt this policy.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> <new policy>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Considerations When Breaking APIs
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> The Spark project strives to avoid breaking APIs or
>>>>>>>>>> silently changing behavior, even at major versions. While this is not
>>>>>>>>>> always possible, the balance of the following factors should be 
>>>>>>>>>> considered
>>>>>>>>>> before choosing to break an API.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Cost of Breaking an API
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Breaking an API almost always has a non-trivial cost to the
>>>>>>>>>> users of Spark. A broken API means that Spark programs need to be 
>>>>>>>>>> rewritten
>>>>>>>>>> before they can be upgraded. However, there are a few considerations 
>>>>>>>>>> when
>>>>>>>>>> thinking about what the cost will be:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Usage - an API that is actively used in many different
>>>>>>>>>> places, is always very costly to break. While it is hard to know 
>>>>>>>>>> usage for
>>>>>>>>>> sure, there are a bunch of ways that we can estimate:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> How long has the API been in Spark?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Is the API common even for basic programs?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> How often do we see recent questions in JIRA or mailing
>>>>>>>>>> lists?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> How often does it appear in StackOverflow or blogs?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Behavior after the break - How will a program that works
>>>>>>>>>> today, work after the break? The following are listed roughly in 
>>>>>>>>>> order of
>>>>>>>>>> increasing severity:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Will there be a compiler or linker error?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Will there be a runtime exception?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Will that exception happen after significant processing has
>>>>>>>>>> been done?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Will we silently return different answers? (very hard to
>>>>>>>>>> debug, might not even notice!)
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Cost of Maintaining an API
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Of course, the above does not mean that we will never break
>>>>>>>>>> any APIs. We must also consider the cost both to the project and to 
>>>>>>>>>> our
>>>>>>>>>> users of keeping the API in question.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Project Costs - Every API we have needs to be tested and
>>>>>>>>>> needs to keep working as other parts of the project changes. These 
>>>>>>>>>> costs
>>>>>>>>>> are significantly exacerbated when external dependencies change (the 
>>>>>>>>>> JVM,
>>>>>>>>>> Scala, etc). In some cases, while not completely technically 
>>>>>>>>>> infeasible,
>>>>>>>>>> the cost of maintaining a particular API can become too high.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> User Costs - APIs also have a cognitive cost to users
>>>>>>>>>> learning Spark or trying to understand Spark programs. This cost 
>>>>>>>>>> becomes
>>>>>>>>>> even higher when the API in question has confusing or undefined 
>>>>>>>>>> semantics.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Alternatives to Breaking an API
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> In cases where there is a "Bad API", but where the cost of
>>>>>>>>>> removal is also high, there are alternatives that should be 
>>>>>>>>>> considered that
>>>>>>>>>> do not hurt existing users but do address some of the maintenance 
>>>>>>>>>> costs.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Avoid Bad APIs - While this is a bit obvious, it is an
>>>>>>>>>> important point. Anytime we are adding a new interface to Spark we 
>>>>>>>>>> should
>>>>>>>>>> consider that we might be stuck with this API forever. Think deeply 
>>>>>>>>>> about
>>>>>>>>>> how new APIs relate to existing ones, as well as how you expect them 
>>>>>>>>>> to
>>>>>>>>>> evolve over time.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Deprecation Warnings - All deprecation warnings should
>>>>>>>>>> point to a clear alternative and should never just say that an API is
>>>>>>>>>> deprecated.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Updated Docs - Documentation should point to the "best"
>>>>>>>>>> recommended way of performing a given task. In the cases where we 
>>>>>>>>>> maintain
>>>>>>>>>> legacy documentation, we should clearly point to newer APIs and 
>>>>>>>>>> suggest to
>>>>>>>>>> users the "right" way.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Community Work - Many people learn Spark by reading blogs
>>>>>>>>>> and other sites such as StackOverflow. However, many of these 
>>>>>>>>>> resources are
>>>>>>>>>> out of date. Update them, to reduce the cost of eventually removing
>>>>>>>>>> deprecated APIs.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> </new policy>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> >> To unsubscribe e-mail: [email protected]
>>>>>>>>>> >>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> Takeshi Yamamuro
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>
>>>
>>
>> --
>> Takuya UESHIN
>>
>> http://twitter.com/ueshin
>>
>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: [VOTE] Amend Spark's Semantic Versioning Policy

Reply via email to