Overall makes sense to me, but have same questions as others on the thread.
Is this only applying to stable apis? How are we going to apply to 3.0?
the way I read this proposal isn't really saying we can't break api's on major 
releases, its just saying spend more time making sure its worth it.Tom
    On Friday, March 6, 2020, 08:59:03 PM CST, Michael Armbrust 
<mich...@databricks.com> wrote:  
 
 
I propose to add the following text to Spark's Semantic Versioning policy and 
adopt it as the rubric that should be used when deciding to break APIs (even at 
major versions such as 3.0).




I'll leave the vote open until Tuesday, March 10th at 2pm. As this is a 
procedural vote, the measure will pass if there are more favourable votes than 
unfavourable ones. PMC votes are binding, but the community is encouraged to 
add their voice to the discussion.




[ ] +1 - Spark should adopt this policy.

[ ] -1  - Spark should not adopt this policy.




<new policy>




Considerations When Breaking APIs

The Spark project strives to avoid breaking APIs or silently changing behavior, 
even at major versions. While this is not always possible, the balance of the 
following factors should be considered before choosing to break an API.


Cost of Breaking an API

Breaking an API almost always has a non-trivial cost to the users of Spark. A 
broken API means that Spark programs need to be rewritten before they can be 
upgraded. However, there are a few considerations when thinking about what the 
cost will be:
   
   -    
Usage - an API that is actively used in many different places, is always very 
costly to break. While it is hard to know usage for sure, there are a bunch of 
ways that we can estimate: 

   
   -    
How long has the API been in Spark?

   -    
Is the API common even for basic programs?

   -    
How often do we see recent questions in JIRA or mailing lists?

   -    
How often does it appear in StackOverflow or blogs?

   
   -    
Behavior after the break - How will a program that works today, work after the 
break? The following are listed roughly in order of increasing severity:

   
   -    
Will there be a compiler or linker error?

   -    
Will there be a runtime exception?

   -    
Will that exception happen after significant processing has been done?

   -    
Will we silently return different answers? (very hard to debug, might not even 
notice!)





Cost of Maintaining an API

Of course, the above does not mean that we will never break any APIs. We must 
also consider the cost both to the project and to our users of keeping the API 
in question.
   
   -    
Project Costs - Every API we have needs to be tested and needs to keep working 
as other parts of the project changes. These costs are significantly 
exacerbated when external dependencies change (the JVM, Scala, etc). In some 
cases, while not completely technically infeasible, the cost of maintaining a 
particular API can become too high.

   -    
User Costs - APIs also have a cognitive cost to users learning Spark or trying 
to understand Spark programs. This cost becomes even higher when the API in 
question has confusing or undefined semantics.



Alternatives to Breaking an API

In cases where there is a "Bad API", but where the cost of removal is also 
high, there are alternatives that should be considered that do not hurt 
existing users but do address some of the maintenance costs.

   
   -    
Avoid Bad APIs - While this is a bit obvious, it is an important point. Anytime 
we are adding a new interface to Spark we should consider that we might be 
stuck with this API forever. Think deeply about how new APIs relate to existing 
ones, as well as how you expect them to evolve over time.

   -    
Deprecation Warnings - All deprecation warnings should point to a clear 
alternative and should never just say that an API is deprecated.

   -    
Updated Docs - Documentation should point to the "best" recommended way of 
performing a given task. In the cases where we maintain legacy documentation, 
we should clearly point to newer APIs and suggest to users the "right" way.

   -    
Community Work - Many people learn Spark by reading blogs and other sites such 
as StackOverflow. However, many of these resources are out of date. Update 
them, to reduce the cost of eventually removing deprecated APIs.


</new policy>  

Reply via email to