+1
*_*
*Michel Miotto Barbosa*, *Data Science/Software Engineer*
Learn MBA Global Financial Broker at IBMEC SP,
Learn Economic Science at PUC SP
MBA in Project Management, Graduate i
n
S
oftware
E
ngineering
phone: +55 11 984 342 347, @michelmb <https://twitter.com/MichelMB>
https://br.linkedin.com/in/michelmiottobarbosa
Em seg., 24 de fev. de 2020 às 20:03, Michael Armbrust <
mich...@databricks.com> escreveu:
> Hello Everyone,
>
> As more users have started upgrading to Spark 3.0 preview (including
> myself), there have been many discussions around APIs that have been broken
> compared with Spark 2.x. In many of these discussions, one of the
> rationales for breaking an API seems to be "Spark follows semantic
> versioning <https://spark.apache.org/versioning-policy.html>, so this
> major release is our chance to get it right [by breaking APIs]". Similarly,
> in many cases the response to questions about why an API was completely
> removed has been, "this API has been deprecated since x.x, so we have to
> remove it".
>
> As a long time contributor to and user of Spark this interpretation of the
> policy is concerning to me. This reasoning misses the intention of the
> original policy, and I am worried that it will hurt the long-term success
> of the project.
>
> I definitely understand that these are hard decisions, and I'm not
> proposing that we never remove anything from Spark. However, I would like
> to give some additional context and also propose a different rubric for
> thinking about API breakage moving forward.
>
> Spark adopted semantic versioning back in 2014 during the preparations for
> the 1.0 release. As this was the first major release -- and as, up until
> fairly recently, Spark had only been an academic project -- no real
> promises had been made about API stability ever.
>
> During the discussion, some committers suggested that this was an
> opportunity to clean up cruft and give the Spark APIs a once-over, making
> cosmetic changes to improve consistency. However, in the end, it was
> decided that in many cases it was not in the best interests of the Spark
> community to break things just because we could. Matei actually said it
> pretty forcefully
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Proposal-for-Spark-Release-Strategy-td464i20.html#a503>
> :
>
> I know that some names are suboptimal, but I absolutely detest breaking
> APIs, config names, etc. I’ve seen it happen way too often in other
> projects (even things we depend on that are officially post-1.0, like Akka
> or Protobuf or Hadoop), and it’s very painful. I think that we as fairly
> cutting-edge users are okay with libraries occasionally changing, but many
> others will consider it a show-stopper. Given this, I think that any
> cosmetic change now, even though it might improve clarity slightly, is not
> worth the tradeoff in terms of creating an update barrier for existing
> users.
>
> In the end, while some changes were made, most APIs remained the same and
> users of Spark <= 0.9 were pretty easily able to upgrade to 1.0. I think
> this served the project very well, as compatibility means users are able to
> upgrade and we keep as many people on the latest versions of Spark (though
> maybe not the latest APIs of Spark) as possible.
>
> As Spark grows, I think compatibility actually becomes more important and
> we should be more conservative rather than less. Today, there are very
> likely more Spark programs running than there were at any other time in the
> past. Spark is no longer a tool only used by advanced hackers, it is now
> also running "traditional enterprise workloads.'' In many cases these jobs
> are powering important processes long after the original author leaves.
>
> Broken APIs can also affect libraries that extend Spark. This dependency
> can be even harder for users, as if the library has not been upgraded to
> use new APIs and they need that library, they are stuck.
>
> Given all of this, I'd like to propose the following rubric as an addition
> to our semantic versioning policy. After discussion and if people agree
> this is a good idea, I'll call a vote of the PMC to ratify its inclusion in
> the official policy.
>
> Considerations When Breaking APIs
>
> The Spark project strives to avoid breaking APIs or silently changing
> behavior, even at major versions. While this is not always possible, the
> balance of the following factors should be considered before choosing to
> break an API.
>
> Cost of Breaking an API
>
> Breaking an API almost always has a non-trivial cost to the users of
> Spark. A broken API means that Spark pro