Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines.
== Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as "alpha". Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked "stable" they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we might make it a requirement in the future, but challenges with things like Scala versions have made this difficult to guarantee in the past. == Merging Pull Requests == To merge pull requests, committers are encouraged to use this tool [2] to collapse the request into one commit rather than manually performing git merges. It will also format the commit message nicely in a way that can be easily parsed later when writing credits. Currently it is maintained in a public utility repository, but we'll merge it into mainline Spark soon. [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py == Tentative Release Window for 1.0.0 == Feb 1st - April 1st: General development April 1st: Code freeze for new features April 15th: RC1 == Deviations == For now, the proposal is to consider these tentative guidelines. We can vote to formalize these as project rules at a later time after some experience working with them. Once formalized, any deviation to these guidelines will be subject to a lazy majority vote. - Patrick