Re: Proposal for Spark Release Strategy

Mark Hamstra Thu, 06 Feb 2014 14:36:03 -0800

Imran:

> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.



And moving master to 1.0.0-SNAPSHOT doesn't preclude that.  If anything, it
turns that "ought to" into "must" -- which is another way of saying what
Reynold said: "The point of 1.0 is for us to self-enforce API compatibility
in the context of longer term support. If we continue down the 0.xx road,
we will always have excuse for breaking APIs."

1.0.0-SNAPSHOT doesn't mean that the API is final right now.  It means that
what is released next will be final over what is intended to be the lengthy
scope of a major release.  That means that adding new features and
functionality (at least to core spark) should be a very low priority for
this development cycle, and establishing the 1.0 API from what is already
in 0.9.0 should be our first priority.  It wouldn't trouble me at all if
not-strictly-necessary new features were left to hang out on the pull
request queue for quite awhile until we are ready to add them in 1.1.0, if
we were to do pretty much nothing else during this cycle except to get the
1.0 API to where most of us agree that it is in good shape.

If we're not adding new features and extending the 0.9.0 API, then there
really is no need for a 0.10.0 minor-release, whose main purpose would be
to collect the API additions from 0.9.0.  Bug-fixes go in 0.9.1-SNAPSHOT;
bug-fixes and finalized 1.0 API go in 1.0.0-SNAPSHOT; almost all new
features are put on hold and wait for 1.1.0-SNAPSHOT.

... it seems possible that there could be new features we'd like to release
> in 0.10...


We certainly can add new features to 1.0.0, but they will have to go
through a rigorous review to be certain that they are things that we really
want to commit to keeping going forward.  But after 1.0, that is true for
any new feature proposal unless we create specifically experimental
branches.  So what moving to 1.0.0-SNAPSHOT really means is that we are
saying that we have gone beyond the development phase where more-or-less
experimental features can be added to Spark releases only to be withdrawn
later -- that time is done after 1.0.0-SNAPSHOT.  Now to be fair,
tentative/experimental features have not been added willy-nilly to Spark
over recent releases, and withdrawal/replacement has been about as limited
in scope as could be fairly expected, so this shouldn't be a radically new
and different development paradigm.  There are, though, some experiments
that were added in the past and should probably now be withdrawn (or at
least deprecated in 1.0.0, withdrawn in 1.1.0.)  I'll put my own
contribution of mapWith, filterWith, et. al on the chopping block as an
effort that, at least in its present form, doesn't provide enough extra
over mapPartitionsWithIndex, and whose syntax is awkward enough that I
don't believe these methods have ever been widely used, so that their
inclusion in the 1.0 API is probably not warranted.

There are other elements of Spark that also should be culled and/or
refactored before 1.0.  Imran has listed a few. I'll also suggest that
there are at least parts of alternative Broadcast variable implementations
that should probably be left behind.  In any event, Imran is absolutely
correct that we need to have a discussion about these issues.  Moving to
1.0.0-SNAPSHOT forces us to begin that discussion.

So, I'm +1 for 1.0.0-incubating-SNAPSHOT (and looking forward to losing the
"incubating"!)




On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid <[email protected]> wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
>
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?
> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)
> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?
>
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
>
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
>
>
>
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia <[email protected]
> >wrote:
>
> > I think it's important to do 1.0 next. The project has been around for 4
> > years, and I'd be comfortable maintaining the current codebase for a long
> > time in an API and binary compatible way through 1.x releases. Over the
> > past 4 years we haven't actually had major changes to the user-facing
> API --
> > the only ones were changing the package to org.apache.spark, and
> upgrading
> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> > two things: it tells users that they can be confident that version will
> be
> > maintained for a long time, which we absolutely want to do, and it lets
> > outsiders see that the project is now fairly mature (for many people,
> > pre-1.0 might still cause them not to try it). I think both are good for
> > the community.
> >
> > Regarding binary compatibility, I agree that it's what we should strive
> > for, but it just seems premature to codify now. Let's see how it works
> > between, say, 1.0 and 1.1, and then we can codify it.
> >
> > Matei
> >
> > On Feb 6, 2014, at 10:43 AM, Henry Saputra <[email protected]>
> > wrote:
> >
> > > Thanks Patick to initiate the discussion about next road map for Apache
> > Spark.
> > >
> > > I am +1 for 0.10.0 for next version.
> > >
> > > It will give us as community some time to digest the process and the
> > > vision and make adjustment accordingly.
> > >
> > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > somehow or modify internal behavior dramatically we could take
> > > advantage to release 1.0.0 as good step to go to.
> > >
> > >
> > > - Henry
> > >
> > >
> > >
> > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash <[email protected]>
> wrote:
> > >> Agree on timeboxed releases as well.
> > >>
> > >> Is there a vision for where we want to be as a project before
> declaring
> > the
> > >> first 1.0 release?  While we're in the 0.x days per semver we can
> break
> > >> backcompat at will (though we try to avoid it where possible), and
> that
> > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > >> because it seems to follow after 0.9 rather than making an intentional
> > >> decision that we're at the point where we can stand by the current
> APIs
> > and
> > >> binary compatibility for the next year or so of the major release.
> > >>
> > >> Until that decision is made as a group I'd rather we do an immediate
> > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> > later,
> > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> > >> but not the other way around.
> > >>
> > >> https://github.com/apache/incubator-spark/pull/542
> > >>
> > >> Cheers!
> > >> Andrew
> > >>
> > >>
> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun <[email protected]
> > >wrote:
> > >>
> > >>> +1 on time boxed releases and compatibility guidelines
> > >>>
> > >>>
> > >>>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell <[email protected]
> >:
> > >>>>
> > >>>> Hi Everyone,
> > >>>>
> > >>>> In an effort to coordinate development amongst the growing list of
> > >>>> Spark contributors, I've taken some time to write up a proposal to
> > >>>> formalize various pieces of the development process. The next
> release
> > >>>> of Spark will likely be Spark 1.0.0, so this message is intended in
> > >>>> part to coordinate the release plan for 1.0.0 and future releases.
> > >>>> I'll post this on the wiki after discussing it on this thread as
> > >>>> tentative project guidelines.
> > >>>>
> > >>>> == Spark Release Structure ==
> > >>>> Starting with Spark 1.0.0, the Spark project will follow the
> semantic
> > >>>> versioning guidelines (http://semver.org/) with a few deviations.
> > >>>> These small differences account for Spark's nature as a multi-module
> > >>>> project.
> > >>>>
> > >>>> Each Spark release will be versioned:
> > >>>> [MAJOR].[MINOR].[MAINTENANCE]
> > >>>>
> > >>>> All releases with the same major version number will have API
> > >>>> compatibility, defined as [1]. Major version numbers will remain
> > >>>> stable over long periods of time. For instance, 1.X.Y may last 1
> year
> > >>>> or more.
> > >>>>
> > >>>> Minor releases will typically contain new features and improvements.
> > >>>> The target frequency for minor releases is every 3-4 months. One
> > >>>> change we'd like to make is to announce fixed release dates and
> merge
> > >>>> windows for each release, to facilitate coordination. Each minor
> > >>>> release will have a merge window where new patches can be merged, a
> QA
> > >>>> window when only fixes can be merged, then a final period where
> voting
> > >>>> occurs on release candidates. These windows will be announced
> > >>>> immediately after the previous minor release to give people plenty
> of
> > >>>> time, and over time, we might make the whole release process more
> > >>>> regular (similar to Ubuntu). At the bottom of this document is an
> > >>>> example window for the 1.0.0 release.
> > >>>>
> > >>>> Maintenance releases will occur more frequently and depend on
> specific
> > >>>> patches introduced (e.g. bug fixes) and their urgency. In general
> > >>>> these releases are designed to patch bugs. However, higher level
> > >>>> libraries may introduce small features, such as a new algorithm,
> > >>>> provided they are entirely additive and isolated from existing code
> > >>>> paths. Spark core may not introduce any features.
> > >>>>
> > >>>> When new components are added to Spark, they may initially be marked
> > >>>> as "alpha". Alpha components do not have to abide by the above
> > >>>> guidelines, however, to the maximum extent possible, they should try
> > >>>> to. Once they are marked "stable" they have to follow these
> > >>>> guidelines. At present, GraphX is the only alpha component of Spark.
> > >>>>
> > >>>> [1] API compatibility:
> > >>>>
> > >>>> An API is any public class or interface exposed in Spark that is not
> > >>>> marked as semi-private or experimental. Release A is API compatible
> > >>>> with release B if code compiled against release A *compiles cleanly*
> > >>>> against B. This does not guarantee that a compiled application that
> is
> > >>>> linked against version A will link cleanly against version B without
> > >>>> re-compiling. Link-level compatibility is something we'll try to
> > >>>> guarantee that as well, and we might make it a requirement in the
> > >>>> future, but challenges with things like Scala versions have made
> this
> > >>>> difficult to guarantee in the past.
> > >>>>
> > >>>> == Merging Pull Requests ==
> > >>>> To merge pull requests, committers are encouraged to use this tool
> [2]
> > >>>> to collapse the request into one commit rather than manually
> > >>>> performing git merges. It will also format the commit message nicely
> > >>>> in a way that can be easily parsed later when writing credits.
> > >>>> Currently it is maintained in a public utility repository, but we'll
> > >>>> merge it into mainline Spark soon.
> > >>>>
> > >>>> [2]
> > >>>
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> > >>>>
> > >>>> == Tentative Release Window for 1.0.0 ==
> > >>>> Feb 1st - April 1st: General development
> > >>>> April 1st: Code freeze for new features
> > >>>> April 15th: RC1
> > >>>>
> > >>>> == Deviations ==
> > >>>> For now, the proposal is to consider these tentative guidelines. We
> > >>>> can vote to formalize these as project rules at a later time after
> > >>>> some experience working with them. Once formalized, any deviation to
> > >>>> these guidelines will be subject to a lazy majority vote.
> > >>>>
> > >>>> - Patrick
> > >>>
> >
> >
>

Re: Proposal for Spark Release Strategy

Reply via email to