Will, Thanks for these thoughts - this is something we should try to be attentive to in the way we think about versioning.
(2)-(5) are pretty consistent with the guidelines we already follow. I think the biggest proposed difference is to be conscious of (1), which at least I had not given much thought to in the past. Specifically, if we make major version upgrades of dependencies within a major release of Spark, it can cause issues for downstream packagers. I can't easily recall how often we do this or whether this will be hard for us to guarantee (maybe others can...). It's something to keep in mind though - thanks for bringing it up. - Patrick On Fri, Feb 7, 2014 at 10:28 AM, Will Benton <[email protected]> wrote: > Semantic versioning is great, and I think the proposed extensions for > adopting it in Spark make a lot of sense. However, by focusing strictly on > public APIs, semantic versioning only solves part of the problem (albeit > certainly the most interesting part). I'd like to raise another issue that > the semantic versioning guidelines explicitly exclude: the relative stability > of dependencies and dependency versions. This is less of a concern for > end-users than it is for downstream packagers, but I believe that the > relative stability of a dependency stack *should* be part of what is implied > by a major version number. > > Here are some suggestions for how to incorporate dependency stack versioning > into semantic versioning in order to make life easier for downstreams; please > consider all of these to be prefaced with "If at all possible,": > > 1. Switching a dependency to an incompatible version should be reserved for > major releases. In general, downstream operating system distributions > support only one version of each library, although in rare cases alternate > versions are available for backwards compatibility. If a bug fix or feature > addition in a patch or minor release depends on adopting a version of some > library that is incompatible with the one used by the prior patch or minor > release, then downstreams may not be able to incorporate the fix or > functionality until every package impacted by the dependency can be updated > to work with the new version. > > 2. New dependencies should only be introduced with new features (and thus > with new minor versions). This suggestion is probably uncontroversial, since > features are more likely than bugfixes to require additional external > libraries. > > 3. The scope of new dependencies should be proportional to the benefit that > they provide. Of course, we want to avoid reinventing the wheel, but if the > alternative is pulling in a framework for WheelFactory generation, a > WheelContainer library, and a dozen transitive dependencies, maybe it's worth > considering reinventing at least the simplest and least general wheels. > > 4. If new functionality requires additional dependencies, it should be > developed to work with the most recent stable version of those libraries that > is generally available. Again, since downstreams typically support only one > version per library at a time, this will make their job easier. (This will > benefit everyone, though, since the most recent version of some dependency is > more likely to see active maintenance efforts.) > > 5. Dependencies can be removed at any time. > > I hope these can be a starting point for further discussion and adoption of > practices that demarcate the scope of dependency changes in a given version > stream. > > > > best, > wb > > > ----- Original Message ----- >> From: "Patrick Wendell" <[email protected]> >> To: [email protected] >> Sent: Wednesday, February 5, 2014 6:20:10 PM >> Subject: Proposal for Spark Release Strategy >> >> Hi Everyone, >> >> In an effort to coordinate development amongst the growing list of >> Spark contributors, I've taken some time to write up a proposal to >> formalize various pieces of the development process. The next release >> of Spark will likely be Spark 1.0.0, so this message is intended in >> part to coordinate the release plan for 1.0.0 and future releases. >> I'll post this on the wiki after discussing it on this thread as >> tentative project guidelines. >> >> == Spark Release Structure == >> Starting with Spark 1.0.0, the Spark project will follow the semantic >> versioning guidelines (http://semver.org/) with a few deviations. >> These small differences account for Spark's nature as a multi-module >> project. >> >> Each Spark release will be versioned: >> [MAJOR].[MINOR].[MAINTENANCE] >> >> All releases with the same major version number will have API >> compatibility, defined as [1]. Major version numbers will remain >> stable over long periods of time. For instance, 1.X.Y may last 1 year >> or more. >> >> Minor releases will typically contain new features and improvements. >> The target frequency for minor releases is every 3-4 months. One >> change we'd like to make is to announce fixed release dates and merge >> windows for each release, to facilitate coordination. Each minor >> release will have a merge window where new patches can be merged, a QA >> window when only fixes can be merged, then a final period where voting >> occurs on release candidates. These windows will be announced >> immediately after the previous minor release to give people plenty of >> time, and over time, we might make the whole release process more >> regular (similar to Ubuntu). At the bottom of this document is an >> example window for the 1.0.0 release. >> >> Maintenance releases will occur more frequently and depend on specific >> patches introduced (e.g. bug fixes) and their urgency. In general >> these releases are designed to patch bugs. However, higher level >> libraries may introduce small features, such as a new algorithm, >> provided they are entirely additive and isolated from existing code >> paths. Spark core may not introduce any features. >> >> When new components are added to Spark, they may initially be marked >> as "alpha". Alpha components do not have to abide by the above >> guidelines, however, to the maximum extent possible, they should try >> to. Once they are marked "stable" they have to follow these >> guidelines. At present, GraphX is the only alpha component of Spark. >> >> [1] API compatibility: >> >> An API is any public class or interface exposed in Spark that is not >> marked as semi-private or experimental. Release A is API compatible >> with release B if code compiled against release A *compiles cleanly* >> against B. This does not guarantee that a compiled application that is >> linked against version A will link cleanly against version B without >> re-compiling. Link-level compatibility is something we'll try to >> guarantee that as well, and we might make it a requirement in the >> future, but challenges with things like Scala versions have made this >> difficult to guarantee in the past. >> >> == Merging Pull Requests == >> To merge pull requests, committers are encouraged to use this tool [2] >> to collapse the request into one commit rather than manually >> performing git merges. It will also format the commit message nicely >> in a way that can be easily parsed later when writing credits. >> Currently it is maintained in a public utility repository, but we'll >> merge it into mainline Spark soon. >> >> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py >> >> == Tentative Release Window for 1.0.0 == >> Feb 1st - April 1st: General development >> April 1st: Code freeze for new features >> April 15th: RC1 >> >> == Deviations == >> For now, the proposal is to consider these tentative guidelines. We >> can vote to formalize these as project rules at a later time after >> some experience working with them. Once formalized, any deviation to >> these guidelines will be subject to a lazy majority vote. >> >> - Patrick >>
