Re: Proposal for Spark Release Strategy

Evan Chan Thu, 06 Feb 2014 11:56:13 -0800

+1 for 0.10.0.

It would give more time to study things (such as the new SparkConf)
and let the community decide if any breaking API changes are needed.


Also, a +1 for minor revisions not breaking code compatibility,
including Scala versions.   (I guess this would mean that 1.x would
stay on Scala 2.10.x)

On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza <[email protected]> wrote:
> Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
> instead of 0.10.0, not to hammer on the binary compatibility point.
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza <[email protected]> wrote:
>
>> *Would it make sense to put in something that strongly discourages binary
>> incompatible changes when possible?
>>
>>
>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza <[email protected]>wrote:
>>
>>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>>  Would it make sense to put something in that . I.e. avoid making needless
>>> changes to class hierarchies.
>>>
>>> Whether Spark considers itself stable or not, users are beginning to
>>> treat it so.  A responsible project will acknowledge this and provide the
>>> stability needed by its user base.  I think some projects have made the
>>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>>> making the hard decisions, but users and downstream projects suffer.
>>>
>>> If Spark needs to go through dramatic changes, there's always the option
>>> of a 2.0.0 that allows for this.
>>>
>>> -Sandy
>>>
>>>
>>>
>>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia 
>>> <[email protected]>wrote:
>>>
>>>> I think it's important to do 1.0 next. The project has been around for 4
>>>> years, and I'd be comfortable maintaining the current codebase for a long
>>>> time in an API and binary compatible way through 1.x releases. Over the
>>>> past 4 years we haven't actually had major changes to the user-facing API 
>>>> --
>>>> the only ones were changing the package to org.apache.spark, and upgrading
>>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>>>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>>>> two things: it tells users that they can be confident that version will be
>>>> maintained for a long time, which we absolutely want to do, and it lets
>>>> outsiders see that the project is now fairly mature (for many people,
>>>> pre-1.0 might still cause them not to try it). I think both are good for
>>>> the community.
>>>>
>>>> Regarding binary compatibility, I agree that it's what we should strive
>>>> for, but it just seems premature to codify now. Let's see how it works
>>>> between, say, 1.0 and 1.1, and then we can codify it.
>>>>
>>>> Matei
>>>>
>>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra <[email protected]>
>>>> wrote:
>>>>
>>>> > Thanks Patick to initiate the discussion about next road map for
>>>> Apache Spark.
>>>> >
>>>> > I am +1 for 0.10.0 for next version.
>>>> >
>>>> > It will give us as community some time to digest the process and the
>>>> > vision and make adjustment accordingly.
>>>> >
>>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>>>> > somehow or modify internal behavior dramatically we could take
>>>> > advantage to release 1.0.0 as good step to go to.
>>>> >
>>>> >
>>>> > - Henry
>>>> >
>>>> >
>>>> >
>>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <[email protected]>
>>>> wrote:
>>>> >> Agree on timeboxed releases as well.
>>>> >>
>>>> >> Is there a vision for where we want to be as a project before
>>>> declaring the
>>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
>>>> break
>>>> >> backcompat at will (though we try to avoid it where possible), and
>>>> that
>>>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>>> >> because it seems to follow after 0.9 rather than making an intentional
>>>> >> decision that we're at the point where we can stand by the current
>>>> APIs and
>>>> >> binary compatibility for the next year or so of the major release.
>>>> >>
>>>> >> Until that decision is made as a group I'd rather we do an immediate
>>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>>>> later,
>>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>>>> 1.0
>>>> >> but not the other way around.
>>>> >>
>>>> >> https://github.com/apache/incubator-spark/pull/542
>>>> >>
>>>> >> Cheers!
>>>> >> Andrew
>>>> >>
>>>> >>
>>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <[email protected]
>>>> >wrote:
>>>> >>
>>>> >>> +1 on time boxed releases and compatibility guidelines
>>>> >>>
>>>> >>>
>>>> >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <[email protected]
>>>> >:
>>>> >>>>
>>>> >>>> Hi Everyone,
>>>> >>>>
>>>> >>>> In an effort to coordinate development amongst the growing list of
>>>> >>>> Spark contributors, I've taken some time to write up a proposal to
>>>> >>>> formalize various pieces of the development process. The next
>>>> release
>>>> >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
>>>> >>>> part to coordinate the release plan for 1.0.0 and future releases.
>>>> >>>> I'll post this on the wiki after discussing it on this thread as
>>>> >>>> tentative project guidelines.
>>>> >>>>
>>>> >>>> == Spark Release Structure ==
>>>> >>>> Starting with Spark 1.0.0, the Spark project will follow the
>>>> semantic
>>>> >>>> versioning guidelines (http://semver.org/) with a few deviations.
>>>> >>>> These small differences account for Spark's nature as a multi-module
>>>> >>>> project.
>>>> >>>>
>>>> >>>> Each Spark release will be versioned:
>>>> >>>> [MAJOR].[MINOR].[MAINTENANCE]
>>>> >>>>
>>>> >>>> All releases with the same major version number will have API
>>>> >>>> compatibility, defined as [1]. Major version numbers will remain
>>>> >>>> stable over long periods of time. For instance, 1.X.Y may last 1
>>>> year
>>>> >>>> or more.
>>>> >>>>
>>>> >>>> Minor releases will typically contain new features and improvements.
>>>> >>>> The target frequency for minor releases is every 3-4 months. One
>>>> >>>> change we'd like to make is to announce fixed release dates and
>>>> merge
>>>> >>>> windows for each release, to facilitate coordination. Each minor
>>>> >>>> release will have a merge window where new patches can be merged, a
>>>> QA
>>>> >>>> window when only fixes can be merged, then a final period where
>>>> voting
>>>> >>>> occurs on release candidates. These windows will be announced
>>>> >>>> immediately after the previous minor release to give people plenty
>>>> of
>>>> >>>> time, and over time, we might make the whole release process more
>>>> >>>> regular (similar to Ubuntu). At the bottom of this document is an
>>>> >>>> example window for the 1.0.0 release.
>>>> >>>>
>>>> >>>> Maintenance releases will occur more frequently and depend on
>>>> specific
>>>> >>>> patches introduced (e.g. bug fixes) and their urgency. In general
>>>> >>>> these releases are designed to patch bugs. However, higher level
>>>> >>>> libraries may introduce small features, such as a new algorithm,
>>>> >>>> provided they are entirely additive and isolated from existing code
>>>> >>>> paths. Spark core may not introduce any features.
>>>> >>>>
>>>> >>>> When new components are added to Spark, they may initially be marked
>>>> >>>> as "alpha". Alpha components do not have to abide by the above
>>>> >>>> guidelines, however, to the maximum extent possible, they should try
>>>> >>>> to. Once they are marked "stable" they have to follow these
>>>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>> >>>>
>>>> >>>> [1] API compatibility:
>>>> >>>>
>>>> >>>> An API is any public class or interface exposed in Spark that is not
>>>> >>>> marked as semi-private or experimental. Release A is API compatible
>>>> >>>> with release B if code compiled against release A *compiles cleanly*
>>>> >>>> against B. This does not guarantee that a compiled application that
>>>> is
>>>> >>>> linked against version A will link cleanly against version B without
>>>> >>>> re-compiling. Link-level compatibility is something we'll try to
>>>> >>>> guarantee that as well, and we might make it a requirement in the
>>>> >>>> future, but challenges with things like Scala versions have made
>>>> this
>>>> >>>> difficult to guarantee in the past.
>>>> >>>>
>>>> >>>> == Merging Pull Requests ==
>>>> >>>> To merge pull requests, committers are encouraged to use this tool
>>>> [2]
>>>> >>>> to collapse the request into one commit rather than manually
>>>> >>>> performing git merges. It will also format the commit message nicely
>>>> >>>> in a way that can be easily parsed later when writing credits.
>>>> >>>> Currently it is maintained in a public utility repository, but we'll
>>>> >>>> merge it into mainline Spark soon.
>>>> >>>>
>>>> >>>> [2]
>>>> >>>
>>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>> >>>>
>>>> >>>> == Tentative Release Window for 1.0.0 ==
>>>> >>>> Feb 1st - April 1st: General development
>>>> >>>> April 1st: Code freeze for new features
>>>> >>>> April 15th: RC1
>>>> >>>>
>>>> >>>> == Deviations ==
>>>> >>>> For now, the proposal is to consider these tentative guidelines. We
>>>> >>>> can vote to formalize these as project rules at a later time after
>>>> >>>> some experience working with them. Once formalized, any deviation to
>>>> >>>> these guidelines will be subject to a lazy majority vote.
>>>> >>>>
>>>> >>>> - Patrick
>>>> >>>
>>>>
>>>>
>>>
>>



-- 
--
Evan Chan
Staff Engineer
[email protected]  |

Re: Proposal for Spark Release Strategy

Reply via email to