Proposal for Spark Release Strategy

Patrick Wendell Wed, 05 Feb 2014 16:21:09 -0800

Hi Everyone,

In an effort to coordinate development amongst the growing list of
Spark contributors, I've taken some time to write up a proposal to
formalize various pieces of the development process. The next release
of Spark will likely be Spark 1.0.0, so this message is intended in
part to coordinate the release plan for 1.0.0 and future releases.
I'll post this on the wiki after discussing it on this thread as
tentative project guidelines.


== Spark Release Structure ==
Starting with Spark 1.0.0, the Spark project will follow the semantic
versioning guidelines (http://semver.org/) with a few deviations.
These small differences account for Spark's nature as a multi-module
project.

Each Spark release will be versioned:
[MAJOR].[MINOR].[MAINTENANCE]

All releases with the same major version number will have API
compatibility, defined as [1]. Major version numbers will remain
stable over long periods of time. For instance, 1.X.Y may last 1 year
or more.

Minor releases will typically contain new features and improvements.
The target frequency for minor releases is every 3-4 months. One
change we'd like to make is to announce fixed release dates and merge
windows for each release, to facilitate coordination. Each minor
release will have a merge window where new patches can be merged, a QA
window when only fixes can be merged, then a final period where voting
occurs on release candidates. These windows will be announced
immediately after the previous minor release to give people plenty of
time, and over time, we might make the whole release process more
regular (similar to Ubuntu). At the bottom of this document is an
example window for the 1.0.0 release.

Maintenance releases will occur more frequently and depend on specific
patches introduced (e.g. bug fixes) and their urgency. In general
these releases are designed to patch bugs. However, higher level
libraries may introduce small features, such as a new algorithm,
provided they are entirely additive and isolated from existing code
paths. Spark core may not introduce any features.

When new components are added to Spark, they may initially be marked
as "alpha". Alpha components do not have to abide by the above
guidelines, however, to the maximum extent possible, they should try
to. Once they are marked "stable" they have to follow these
guidelines. At present, GraphX is the only alpha component of Spark.

[1] API compatibility:

An API is any public class or interface exposed in Spark that is not
marked as semi-private or experimental. Release A is API compatible
with release B if code compiled against release A *compiles cleanly*
against B. This does not guarantee that a compiled application that is
linked against version A will link cleanly against version B without
re-compiling. Link-level compatibility is something we'll try to
guarantee that as well, and we might make it a requirement in the
future, but challenges with things like Scala versions have made this
difficult to guarantee in the past.

== Merging Pull Requests ==
To merge pull requests, committers are encouraged to use this tool [2]
to collapse the request into one commit rather than manually
performing git merges. It will also format the commit message nicely
in a way that can be easily parsed later when writing credits.
Currently it is maintained in a public utility repository, but we'll
merge it into mainline Spark soon.

[2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py

== Tentative Release Window for 1.0.0 ==
Feb 1st - April 1st: General development
April 1st: Code freeze for new features
April 15th: RC1

== Deviations ==
For now, the proposal is to consider these tentative guidelines. We
can vote to formalize these as project rules at a later time after
some experience working with them. Once formalized, any deviation to
these guidelines will be subject to a lazy majority vote.

- Patrick

Proposal for Spark Release Strategy

Reply via email to