Re: Proposal for Spark Release Strategy

Patrick Wendell Fri, 07 Feb 2014 14:25:15 -0800

Will,

Thanks for these thoughts - this is something we should try to be
attentive to in the way we think about versioning.


(2)-(5) are pretty consistent with the guidelines we already follow. I
think the biggest proposed difference is to be conscious of (1), which
at least I had not given much thought to in the past. Specifically, if
we make major version upgrades of dependencies within a major release
of Spark, it can cause issues for downstream packagers. I can't easily
recall how often we do this or whether this will be hard for us to
guarantee (maybe others can...). It's something to keep in mind though
- thanks for bringing it up.

- Patrick

On Fri, Feb 7, 2014 at 10:28 AM, Will Benton <[email protected]> wrote:
> Semantic versioning is great, and I think the proposed extensions for 
> adopting it in Spark make a lot of sense.  However, by focusing strictly on 
> public APIs, semantic versioning only solves part of the problem (albeit 
> certainly the most interesting part).  I'd like to raise another issue that 
> the semantic versioning guidelines explicitly exclude: the relative stability 
> of dependencies and dependency versions.  This is less of a concern for 
> end-users than it is for downstream packagers, but I believe that the 
> relative stability of a dependency stack *should* be part of what is implied 
> by a major version number.
>
> Here are some suggestions for how to incorporate dependency stack versioning 
> into semantic versioning in order to make life easier for downstreams; please 
> consider all of these to be prefaced with "If at all possible,":
>
> 1.  Switching a dependency to an incompatible version should be reserved for 
> major releases.  In general, downstream operating system distributions 
> support only one version of each library, although in rare cases alternate 
> versions are available for backwards compatibility.  If a bug fix or feature 
> addition in a patch or minor release depends on adopting a version of some 
> library that is incompatible with the one used by the prior patch or minor 
> release, then downstreams may not be able to incorporate the fix or 
> functionality until every package impacted by the dependency can be updated 
> to work with the new version.
>
> 2.  New dependencies should only be introduced with new features (and thus 
> with new minor versions).  This suggestion is probably uncontroversial, since 
> features are more likely than bugfixes to require additional external 
> libraries.
>
> 3.  The scope of new dependencies should be proportional to the benefit that 
> they provide.  Of course, we want to avoid reinventing the wheel, but if the 
> alternative is pulling in a framework for WheelFactory generation, a 
> WheelContainer library, and a dozen transitive dependencies, maybe it's worth 
> considering reinventing at least the simplest and least general wheels.
>
> 4.  If new functionality requires additional dependencies, it should be 
> developed to work with the most recent stable version of those libraries that 
> is generally available.  Again, since downstreams typically support only one 
> version per library at a time, this will make their job easier.  (This will 
> benefit everyone, though, since the most recent version of some dependency is 
> more likely to see active maintenance efforts.)
>
> 5.  Dependencies can be removed at any time.
>
> I hope these can be a starting point for further discussion and adoption of 
> practices that demarcate the scope of dependency changes in a given version 
> stream.
>
>
>
> best,
> wb
>
>
> ----- Original Message -----
>> From: "Patrick Wendell" <[email protected]>
>> To: [email protected]
>> Sent: Wednesday, February 5, 2014 6:20:10 PM
>> Subject: Proposal for Spark Release Strategy
>>
>> Hi Everyone,
>>
>> In an effort to coordinate development amongst the growing list of
>> Spark contributors, I've taken some time to write up a proposal to
>> formalize various pieces of the development process. The next release
>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> part to coordinate the release plan for 1.0.0 and future releases.
>> I'll post this on the wiki after discussing it on this thread as
>> tentative project guidelines.
>>
>> == Spark Release Structure ==
>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>> versioning guidelines (http://semver.org/) with a few deviations.
>> These small differences account for Spark's nature as a multi-module
>> project.
>>
>> Each Spark release will be versioned:
>> [MAJOR].[MINOR].[MAINTENANCE]
>>
>> All releases with the same major version number will have API
>> compatibility, defined as [1]. Major version numbers will remain
>> stable over long periods of time. For instance, 1.X.Y may last 1 year
>> or more.
>>
>> Minor releases will typically contain new features and improvements.
>> The target frequency for minor releases is every 3-4 months. One
>> change we'd like to make is to announce fixed release dates and merge
>> windows for each release, to facilitate coordination. Each minor
>> release will have a merge window where new patches can be merged, a QA
>> window when only fixes can be merged, then a final period where voting
>> occurs on release candidates. These windows will be announced
>> immediately after the previous minor release to give people plenty of
>> time, and over time, we might make the whole release process more
>> regular (similar to Ubuntu). At the bottom of this document is an
>> example window for the 1.0.0 release.
>>
>> Maintenance releases will occur more frequently and depend on specific
>> patches introduced (e.g. bug fixes) and their urgency. In general
>> these releases are designed to patch bugs. However, higher level
>> libraries may introduce small features, such as a new algorithm,
>> provided they are entirely additive and isolated from existing code
>> paths. Spark core may not introduce any features.
>>
>> When new components are added to Spark, they may initially be marked
>> as "alpha". Alpha components do not have to abide by the above
>> guidelines, however, to the maximum extent possible, they should try
>> to. Once they are marked "stable" they have to follow these
>> guidelines. At present, GraphX is the only alpha component of Spark.
>>
>> [1] API compatibility:
>>
>> An API is any public class or interface exposed in Spark that is not
>> marked as semi-private or experimental. Release A is API compatible
>> with release B if code compiled against release A *compiles cleanly*
>> against B. This does not guarantee that a compiled application that is
>> linked against version A will link cleanly against version B without
>> re-compiling. Link-level compatibility is something we'll try to
>> guarantee that as well, and we might make it a requirement in the
>> future, but challenges with things like Scala versions have made this
>> difficult to guarantee in the past.
>>
>> == Merging Pull Requests ==
>> To merge pull requests, committers are encouraged to use this tool [2]
>> to collapse the request into one commit rather than manually
>> performing git merges. It will also format the commit message nicely
>> in a way that can be easily parsed later when writing credits.
>> Currently it is maintained in a public utility repository, but we'll
>> merge it into mainline Spark soon.
>>
>> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>
>> == Tentative Release Window for 1.0.0 ==
>> Feb 1st - April 1st: General development
>> April 1st: Code freeze for new features
>> April 15th: RC1
>>
>> == Deviations ==
>> For now, the proposal is to consider these tentative guidelines. We
>> can vote to formalize these as project rules at a later time after
>> some experience working with them. Once formalized, any deviation to
>> these guidelines will be subject to a lazy majority vote.
>>
>> - Patrick
>>

Re: Proposal for Spark Release Strategy

Reply via email to