Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
Ok, JIRA ticket filed [1] for this one.

- Henry

[1] https://spark-project.atlassian.net/browse/SPARK-1070

On Sat, Feb 8, 2014 at 3:39 PM, Patrick Wendell  wrote:
> :P - I'm pretty sure this can be done but it will require some work -
> we already use the github API in our merge script and we could hook
> something like that up with the jenkins tests. Henry maybe you could
> create a JIRA for this for Spark 1.0?
>
> - Patrick
>
> On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra  wrote:
>> I know that it can be done -- which is different from saying that I know how 
>> to set it up.
>>
>>
>>> On Feb 8, 2014, at 2:57 PM, Henry Saputra  wrote:
>>>
>>> Patrick, do you know if there is a way to check if a Github PR's
>>> subject/ title contains JIRA number and will raise warning by the
>>> Jenkins?
>>>
>>> - Henry
>>>
 On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell  
 wrote:
 Hey All,

 Thanks for everyone who participated in this thread. I've distilled
 feedback based on the discussion and wanted to summarize the
 conclusions:

 - People seem universally +1 on semantic versioning in general.

 - People seem universally +1 on having a public merge windows for releases.

 - People seem universally +1 on a policy of having associated JIRA's
 with features.

 - Everyone believes link-level compatiblity should be the goal. Some
 people think we should outright promise it now. Others thing we should
 either not promise it or promise it later.
 --> Compromise: let's do one minor release 1.0->1.1 to convince
 ourselves this is possible (some issues with Scala traits will make
 this tricky). Then we can codify it in writing. I've created
 SPARK-1069 [1] to clearly establish that this is the goal for 1.X
 family of releases.

 - Some people think we should add particular features before having 1.0.
 --> Version 1.X indicates API stability rather than a feature set;
 this was clarified.
 --> That said, people still have several months to work on features if
 they really want to get them in for this release.

 I'm going to integrate this feedback and post a tentative version of
 the release guidelines to the wiki.

 With all this said, I would like to move the master version to
 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
 clarified. This merely represents a tentative consensus and the
 release is still subject to a formal vote amongst PMC members.

 [1] https://spark-project.atlassian.net/browse/SPARK-1069

 - Patrick


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
:)

Sure thing. I will create JIRA ticket for this.

Thx guys,

Henry

On Saturday, February 8, 2014, Patrick Wendell  wrote:

> :P - I'm pretty sure this can be done but it will require some work -
> we already use the github API in our merge script and we could hook
> something like that up with the jenkins tests. Henry maybe you could
> create a JIRA for this for Spark 1.0?
>
> - Patrick
>
> On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra 
> >
> wrote:
> > I know that it can be done -- which is different from saying that I know
> how to set it up.
> >
> >
> >> On Feb 8, 2014, at 2:57 PM, Henry Saputra 
> >> >
> wrote:
> >>
> >> Patrick, do you know if there is a way to check if a Github PR's
> >> subject/ title contains JIRA number and will raise warning by the
> >> Jenkins?
> >>
> >> - Henry
> >>
> >>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell 
> >>> >
> wrote:
> >>> Hey All,
> >>>
> >>> Thanks for everyone who participated in this thread. I've distilled
> >>> feedback based on the discussion and wanted to summarize the
> >>> conclusions:
> >>>
> >>> - People seem universally +1 on semantic versioning in general.
> >>>
> >>> - People seem universally +1 on having a public merge windows for
> releases.
> >>>
> >>> - People seem universally +1 on a policy of having associated JIRA's
> >>> with features.
> >>>
> >>> - Everyone believes link-level compatiblity should be the goal. Some
> >>> people think we should outright promise it now. Others thing we should
> >>> either not promise it or promise it later.
> >>> --> Compromise: let's do one minor release 1.0->1.1 to convince
> >>> ourselves this is possible (some issues with Scala traits will make
> >>> this tricky). Then we can codify it in writing. I've created
> >>> SPARK-1069 [1] to clearly establish that this is the goal for 1.X
> >>> family of releases.
> >>>
> >>> - Some people think we should add particular features before having
> 1.0.
> >>> --> Version 1.X indicates API stability rather than a feature set;
> >>> this was clarified.
> >>> --> That said, people still have several months to work on features if
> >>> they really want to get them in for this release.
> >>>
> >>> I'm going to integrate this feedback and post a tentative version of
> >>> the release guidelines to the wiki.
> >>>
> >>> With all this said, I would like to move the master version to
> >>> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
> >>> clarified. This merely represents a tentative consensus and the
> >>> release is still subject to a formal vote amongst PMC members.
> >>>
> >>> [1] https://spark-project.atlassian.net/browse/SPARK-1069
> >>>
> >>> - Patrick
>


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
:P - I'm pretty sure this can be done but it will require some work -
we already use the github API in our merge script and we could hook
something like that up with the jenkins tests. Henry maybe you could
create a JIRA for this for Spark 1.0?

- Patrick

On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra  wrote:
> I know that it can be done -- which is different from saying that I know how 
> to set it up.
>
>
>> On Feb 8, 2014, at 2:57 PM, Henry Saputra  wrote:
>>
>> Patrick, do you know if there is a way to check if a Github PR's
>> subject/ title contains JIRA number and will raise warning by the
>> Jenkins?
>>
>> - Henry
>>
>>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell  wrote:
>>> Hey All,
>>>
>>> Thanks for everyone who participated in this thread. I've distilled
>>> feedback based on the discussion and wanted to summarize the
>>> conclusions:
>>>
>>> - People seem universally +1 on semantic versioning in general.
>>>
>>> - People seem universally +1 on having a public merge windows for releases.
>>>
>>> - People seem universally +1 on a policy of having associated JIRA's
>>> with features.
>>>
>>> - Everyone believes link-level compatiblity should be the goal. Some
>>> people think we should outright promise it now. Others thing we should
>>> either not promise it or promise it later.
>>> --> Compromise: let's do one minor release 1.0->1.1 to convince
>>> ourselves this is possible (some issues with Scala traits will make
>>> this tricky). Then we can codify it in writing. I've created
>>> SPARK-1069 [1] to clearly establish that this is the goal for 1.X
>>> family of releases.
>>>
>>> - Some people think we should add particular features before having 1.0.
>>> --> Version 1.X indicates API stability rather than a feature set;
>>> this was clarified.
>>> --> That said, people still have several months to work on features if
>>> they really want to get them in for this release.
>>>
>>> I'm going to integrate this feedback and post a tentative version of
>>> the release guidelines to the wiki.
>>>
>>> With all this said, I would like to move the master version to
>>> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
>>> clarified. This merely represents a tentative consensus and the
>>> release is still subject to a formal vote amongst PMC members.
>>>
>>> [1] https://spark-project.atlassian.net/browse/SPARK-1069
>>>
>>> - Patrick


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Mark Hamstra
I know that it can be done -- which is different from saying that I know how to 
set it up.


> On Feb 8, 2014, at 2:57 PM, Henry Saputra  wrote:
> 
> Patrick, do you know if there is a way to check if a Github PR's
> subject/ title contains JIRA number and will raise warning by the
> Jenkins?
> 
> - Henry
> 
>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell  wrote:
>> Hey All,
>> 
>> Thanks for everyone who participated in this thread. I've distilled
>> feedback based on the discussion and wanted to summarize the
>> conclusions:
>> 
>> - People seem universally +1 on semantic versioning in general.
>> 
>> - People seem universally +1 on having a public merge windows for releases.
>> 
>> - People seem universally +1 on a policy of having associated JIRA's
>> with features.
>> 
>> - Everyone believes link-level compatiblity should be the goal. Some
>> people think we should outright promise it now. Others thing we should
>> either not promise it or promise it later.
>> --> Compromise: let's do one minor release 1.0->1.1 to convince
>> ourselves this is possible (some issues with Scala traits will make
>> this tricky). Then we can codify it in writing. I've created
>> SPARK-1069 [1] to clearly establish that this is the goal for 1.X
>> family of releases.
>> 
>> - Some people think we should add particular features before having 1.0.
>> --> Version 1.X indicates API stability rather than a feature set;
>> this was clarified.
>> --> That said, people still have several months to work on features if
>> they really want to get them in for this release.
>> 
>> I'm going to integrate this feedback and post a tentative version of
>> the release guidelines to the wiki.
>> 
>> With all this said, I would like to move the master version to
>> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
>> clarified. This merely represents a tentative consensus and the
>> release is still subject to a formal vote amongst PMC members.
>> 
>> [1] https://spark-project.atlassian.net/browse/SPARK-1069
>> 
>> - Patrick


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Henry Saputra
Patrick, do you know if there is a way to check if a Github PR's
subject/ title contains JIRA number and will raise warning by the
Jenkins?

- Henry

On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell  wrote:
> Hey All,
>
> Thanks for everyone who participated in this thread. I've distilled
> feedback based on the discussion and wanted to summarize the
> conclusions:
>
> - People seem universally +1 on semantic versioning in general.
>
> - People seem universally +1 on having a public merge windows for releases.
>
> - People seem universally +1 on a policy of having associated JIRA's
> with features.
>
> - Everyone believes link-level compatiblity should be the goal. Some
> people think we should outright promise it now. Others thing we should
> either not promise it or promise it later.
> --> Compromise: let's do one minor release 1.0->1.1 to convince
> ourselves this is possible (some issues with Scala traits will make
> this tricky). Then we can codify it in writing. I've created
> SPARK-1069 [1] to clearly establish that this is the goal for 1.X
> family of releases.
>
> - Some people think we should add particular features before having 1.0.
> --> Version 1.X indicates API stability rather than a feature set;
> this was clarified.
> --> That said, people still have several months to work on features if
> they really want to get them in for this release.
>
> I'm going to integrate this feedback and post a tentative version of
> the release guidelines to the wiki.
>
> With all this said, I would like to move the master version to
> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
> clarified. This merely represents a tentative consensus and the
> release is still subject to a formal vote amongst PMC members.
>
> [1] https://spark-project.atlassian.net/browse/SPARK-1069
>
> - Patrick


Re: [SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Andy Konwinski
Thanks for the summary Patrick. I'm glad that we discussed the options
before pulling the trigger on a version number update (my -1 had only been
about committing a major version update without thorough discussion).
IMO that's been addressed and given the discussion, I'm changing to a +1
for 1.0.0
On Feb 8, 2014 12:56 PM, "Patrick Wendell"  wrote:

> Hey All,
>
> Thanks for everyone who participated in this thread. I've distilled
> feedback based on the discussion and wanted to summarize the
> conclusions:
>
> - People seem universally +1 on semantic versioning in general.
>
> - People seem universally +1 on having a public merge windows for releases.
>
> - People seem universally +1 on a policy of having associated JIRA's
> with features.
>
> - Everyone believes link-level compatiblity should be the goal. Some
> people think we should outright promise it now. Others thing we should
> either not promise it or promise it later.
> --> Compromise: let's do one minor release 1.0->1.1 to convince
> ourselves this is possible (some issues with Scala traits will make
> this tricky). Then we can codify it in writing. I've created
> SPARK-1069 [1] to clearly establish that this is the goal for 1.X
> family of releases.
>
> - Some people think we should add particular features before having 1.0.
> --> Version 1.X indicates API stability rather than a feature set;
> this was clarified.
> --> That said, people still have several months to work on features if
> they really want to get them in for this release.
>
> I'm going to integrate this feedback and post a tentative version of
> the release guidelines to the wiki.
>
> With all this said, I would like to move the master version to
> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and
> clarified. This merely represents a tentative consensus and the
> release is still subject to a formal vote amongst PMC members.
>
> [1] https://spark-project.atlassian.net/browse/SPARK-1069
>
> - Patrick
>


[SUMMARY] Proposal for Spark Release Strategy

2014-02-08 Thread Patrick Wendell
Hey All,

Thanks for everyone who participated in this thread. I've distilled
feedback based on the discussion and wanted to summarize the
conclusions:

- People seem universally +1 on semantic versioning in general.

- People seem universally +1 on having a public merge windows for releases.

- People seem universally +1 on a policy of having associated JIRA's
with features.

- Everyone believes link-level compatiblity should be the goal. Some
people think we should outright promise it now. Others thing we should
either not promise it or promise it later.
--> Compromise: let's do one minor release 1.0->1.1 to convince
ourselves this is possible (some issues with Scala traits will make
this tricky). Then we can codify it in writing. I've created
SPARK-1069 [1] to clearly establish that this is the goal for 1.X
family of releases.

- Some people think we should add particular features before having 1.0.
--> Version 1.X indicates API stability rather than a feature set;
this was clarified.
--> That said, people still have several months to work on features if
they really want to get them in for this release.

I'm going to integrate this feedback and post a tentative version of
the release guidelines to the wiki.

With all this said, I would like to move the master version to
1.0.0-SNAPSHOT as the main concerns with this have been addressed and
clarified. This merely represents a tentative consensus and the
release is still subject to a formal vote amongst PMC members.

[1] https://spark-project.atlassian.net/browse/SPARK-1069

- Patrick


Re: Proposal for Spark Release Strategy

2014-02-07 Thread Patrick Wendell
Will,

Thanks for these thoughts - this is something we should try to be
attentive to in the way we think about versioning.

(2)-(5) are pretty consistent with the guidelines we already follow. I
think the biggest proposed difference is to be conscious of (1), which
at least I had not given much thought to in the past. Specifically, if
we make major version upgrades of dependencies within a major release
of Spark, it can cause issues for downstream packagers. I can't easily
recall how often we do this or whether this will be hard for us to
guarantee (maybe others can...). It's something to keep in mind though
- thanks for bringing it up.

- Patrick

On Fri, Feb 7, 2014 at 10:28 AM, Will Benton  wrote:
> Semantic versioning is great, and I think the proposed extensions for 
> adopting it in Spark make a lot of sense.  However, by focusing strictly on 
> public APIs, semantic versioning only solves part of the problem (albeit 
> certainly the most interesting part).  I'd like to raise another issue that 
> the semantic versioning guidelines explicitly exclude: the relative stability 
> of dependencies and dependency versions.  This is less of a concern for 
> end-users than it is for downstream packagers, but I believe that the 
> relative stability of a dependency stack *should* be part of what is implied 
> by a major version number.
>
> Here are some suggestions for how to incorporate dependency stack versioning 
> into semantic versioning in order to make life easier for downstreams; please 
> consider all of these to be prefaced with "If at all possible,":
>
> 1.  Switching a dependency to an incompatible version should be reserved for 
> major releases.  In general, downstream operating system distributions 
> support only one version of each library, although in rare cases alternate 
> versions are available for backwards compatibility.  If a bug fix or feature 
> addition in a patch or minor release depends on adopting a version of some 
> library that is incompatible with the one used by the prior patch or minor 
> release, then downstreams may not be able to incorporate the fix or 
> functionality until every package impacted by the dependency can be updated 
> to work with the new version.
>
> 2.  New dependencies should only be introduced with new features (and thus 
> with new minor versions).  This suggestion is probably uncontroversial, since 
> features are more likely than bugfixes to require additional external 
> libraries.
>
> 3.  The scope of new dependencies should be proportional to the benefit that 
> they provide.  Of course, we want to avoid reinventing the wheel, but if the 
> alternative is pulling in a framework for WheelFactory generation, a 
> WheelContainer library, and a dozen transitive dependencies, maybe it's worth 
> considering reinventing at least the simplest and least general wheels.
>
> 4.  If new functionality requires additional dependencies, it should be 
> developed to work with the most recent stable version of those libraries that 
> is generally available.  Again, since downstreams typically support only one 
> version per library at a time, this will make their job easier.  (This will 
> benefit everyone, though, since the most recent version of some dependency is 
> more likely to see active maintenance efforts.)
>
> 5.  Dependencies can be removed at any time.
>
> I hope these can be a starting point for further discussion and adoption of 
> practices that demarcate the scope of dependency changes in a given version 
> stream.
>
>
>
> best,
> wb
>
>
> - Original Message -
>> From: "Patrick Wendell" 
>> To: dev@spark.incubator.apache.org
>> Sent: Wednesday, February 5, 2014 6:20:10 PM
>> Subject: Proposal for Spark Release Strategy
>>
>> Hi Everyone,
>>
>> In an effort to coordinate development amongst the growing list of
>> Spark contributors, I've taken some time to write up a proposal to
>> formalize various pieces of the development process. The next release
>> of Spark will likely be Spark 1.0.0, so this message is intended in
>> part to coordinate the release plan for 1.0.0 and future releases.
>> I'll post this on the wiki after discussing it on this thread as
>> tentative project guidelines.
>>
>> == Spark Release Structure ==
>> Starting with Spark 1.0.0, the Spark project will follow the semantic
>> versioning guidelines (http://semver.org/) with a few deviations.
>> These small differences account for Spark's nature as a multi-module
>> project.
>>
>> Each Spark release will be versioned:
>> [MAJOR].[MINOR].[MAINTENANCE]
>>
>> All releases with the same major version num

Re: Proposal for Spark Release Strategy

2014-02-07 Thread Will Benton
Semantic versioning is great, and I think the proposed extensions for adopting 
it in Spark make a lot of sense.  However, by focusing strictly on public APIs, 
semantic versioning only solves part of the problem (albeit certainly the most 
interesting part).  I'd like to raise another issue that the semantic 
versioning guidelines explicitly exclude: the relative stability of 
dependencies and dependency versions.  This is less of a concern for end-users 
than it is for downstream packagers, but I believe that the relative stability 
of a dependency stack *should* be part of what is implied by a major version 
number.

Here are some suggestions for how to incorporate dependency stack versioning 
into semantic versioning in order to make life easier for downstreams; please 
consider all of these to be prefaced with "If at all possible,":

1.  Switching a dependency to an incompatible version should be reserved for 
major releases.  In general, downstream operating system distributions support 
only one version of each library, although in rare cases alternate versions are 
available for backwards compatibility.  If a bug fix or feature addition in a 
patch or minor release depends on adopting a version of some library that is 
incompatible with the one used by the prior patch or minor release, then 
downstreams may not be able to incorporate the fix or functionality until every 
package impacted by the dependency can be updated to work with the new version.

2.  New dependencies should only be introduced with new features (and thus with 
new minor versions).  This suggestion is probably uncontroversial, since 
features are more likely than bugfixes to require additional external libraries.

3.  The scope of new dependencies should be proportional to the benefit that 
they provide.  Of course, we want to avoid reinventing the wheel, but if the 
alternative is pulling in a framework for WheelFactory generation, a 
WheelContainer library, and a dozen transitive dependencies, maybe it's worth 
considering reinventing at least the simplest and least general wheels.

4.  If new functionality requires additional dependencies, it should be 
developed to work with the most recent stable version of those libraries that 
is generally available.  Again, since downstreams typically support only one 
version per library at a time, this will make their job easier.  (This will 
benefit everyone, though, since the most recent version of some dependency is 
more likely to see active maintenance efforts.)

5.  Dependencies can be removed at any time.

I hope these can be a starting point for further discussion and adoption of 
practices that demarcate the scope of dependency changes in a given version 
stream.



best,
wb


- Original Message -
> From: "Patrick Wendell" 
> To: dev@spark.incubator.apache.org
> Sent: Wednesday, February 5, 2014 6:20:10 PM
> Subject: Proposal for Spark Release Strategy
> 
> Hi Everyone,
> 
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
> 
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
> 
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
> 
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
> 
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
> 
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and 

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Mark Hamstra
I'm not sure that that is the conclusion that I would draw from the Hadoop
example.  I would certainly agree that maintaining and supporting both an
old and a new API is a cause of endless confusion for users.  If we are
going to change or drop things from the API to reach 1.0, then we shouldn't
be maintaining and support the prior way of doing things beyond a 1.0.0 ->
1.1.0 deprecation cycle.


On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza  wrote:

> If the APIs are usable, stability and continuity are much more important
> than perfection.  With many already relying on the current APIs, I think
> trying to clean them up will just cause pain for users and integrators.
>  Hadoop made this mistake when they decided the original MapReduce APIs
> were ugly and introduced a new set of APIs to do the same thing.  Even
> though this happened in a pre-1.0 release, three years down the road, both
> the old and new APIs are still supported, causing endless confusion for
> users.  If individual functions or configuration properties have unclear
> names, they can be deprecated and replaced, but redoing the APIs or
> breaking compatibility at this point is simply not worth it.
>
>
> On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid 
> wrote:
>
> > I don't really agree with this logic.  I think we haven't broken API so
> far
> > because we just keep adding stuff on to it, and we haven't bothered to
> > clean the api up, specifically to *avoid* breaking things.  Here's a
> > handful of api breaking things that we might want to consider:
> >
> > * should we look at all the various configuration properties, and maybe
> > some of them should get renamed for consistency / clarity?
> > * do all of the functions on RDD need to be in core?  or do some of them
> > that are simple additions built on top of the primitives really belong
> in a
> > "utils" package or something?  Eg., maybe we should get rid of all the
> > variants of the mapPartitions / mapWith / etc.  just have map, and
> > mapPartitionsWithIndex  (too many choices in the api can also be
> confusing
> > to the user)
> > * are the right things getting tracked in SparkListener?  Do we need to
> add
> > or remove anything?
> >
> > This is probably not the right list of questions, that's just an idea of
> > the kind of thing we should be thinking about.
> >
> > Its also fine with me if 1.0 is next, I just think that we ought to be
> > asking these kinds of questions up and down the entire api before we
> > release 1.0.  And given that we haven't even started that discussion, it
> > seems possible that there could be new features we'd like to release in
> > 0.10 before that discussion is finished.
> >
> >
> >
> > On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia  > >wrote:
> >
> > > I think it's important to do 1.0 next. The project has been around for
> 4
> > > years, and I'd be comfortable maintaining the current codebase for a
> long
> > > time in an API and binary compatible way through 1.x releases. Over the
> > > past 4 years we haven't actually had major changes to the user-facing
> > API --
> > > the only ones were changing the package to org.apache.spark, and
> > upgrading
> > > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > > example, or later cross-building it for Scala 2.11. Updating to 1.0
> says
> > > two things: it tells users that they can be confident that version will
> > be
> > > maintained for a long time, which we absolutely want to do, and it lets
> > > outsiders see that the project is now fairly mature (for many people,
> > > pre-1.0 might still cause them not to try it). I think both are good
> for
> > > the community.
> > >
> > > Regarding binary compatibility, I agree that it's what we should strive
> > > for, but it just seems premature to codify now. Let's see how it works
> > > between, say, 1.0 and 1.1, and then we can codify it.
> > >
> > > Matei
> > >
> > > On Feb 6, 2014, at 10:43 AM, Henry Saputra 
> > > wrote:
> > >
> > > > Thanks Patick to initiate the discussion about next road map for
> Apache
> > > Spark.
> > > >
> > > > I am +1 for 0.10.0 for next version.
> > > >
> > > > It will give us as community some time to digest the process and the
> > > > vision and make adjustment accordingly.
> > > >
> > > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > > somehow or modify internal behavior dramatically we could take
> > > > advantage to release 1.0.0 as good step to go to.
> > > >
> > > >
> > > > - Henry
> > > >
> > > >
> > > >
> > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
> > wrote:
> > > >> Agree on timeboxed releases as well.
> > > >>
> > > >> Is there a vision for where we want to be as a project before
> > declaring
> > > the
> > > >> first 1.0 release?  While we're in the 0.x days per semver we can
> > break
> > > >> backcompat at will (though we try to avoid it where possible), and
> > that
> > > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > > >

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Matei Zaharia
I think these are good questions to bring up, Imran. Here are my thoughts on 
them (I’ve thought about some of these in the past):

On Feb 6, 2014, at 12:39 PM, Imran Rashid  wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
> 
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?

I know that some names are suboptimal, but I absolutely detest breaking APIs, 
config names, etc. I’ve seen it happen way too often in other projects (even 
things we depend on that are officially post-1.0, like Akka or Protobuf or 
Hadoop), and it’s very painful. I think that we as fairly cutting-edge users 
are okay with libraries occasionally changing, but many others will consider it 
a show-stopper. Given this, I think that any cosmetic change now, even though 
it might improve clarity slightly, is not worth the tradeoff in terms of 
creating an update barrier for existing users.

> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)

Again, for the reason above, I’d keep them where they are and consider adding 
other stuff later. Also personally I want to optimize the API for usability, 
not for Spark developers. If it’s easier for a user to call RDD.mapPartitions 
instead of AdvancedUtils.mapPartitions(rdd, func), and the only cost is a 
longer RDD.scala class, I’d go for the former. If you think there are some API 
methods that should just go away, that would be good to discuss — we can 
deprecate them for example.

> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?

This is an API that will probably be experimental or semi-private at first.

Anyway, as I said, these are good questions — I’d be happy to see suggestions 
on any of these fronts. I just wanted to point out the importance of 
compatibility. I think it’s been awesome that most of our users have been able 
to keep up with the latest version of Spark, getting all the new fixes and 
simultaneously increasing the amount of contributions we get on master and 
decreasing the backporting burden on old branches. We might take it for 
granted, but I’ve seen similar projects that didn't manage to do this. In 
particular, compatibility in Hadoop has been a mess, with some major users 
diverging from Apache early (e.g. Facebook) and never being able to contribute 
back, and with big API cleanups (e.g. mapred -> mapreduce) being proposed after 
the project already had a lot of momentum and never making it through. The 
experience of seeing those has made me very conservative. The longer we can 
keep a unified community, the better it will be for all users of the project.

Matei

> 
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
> 
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
> 
> 
> 
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia wrote:
> 
>> I think it's important to do 1.0 next. The project has been around for 4
>> years, and I'd be comfortable maintaining the current codebase for a long
>> time in an API and binary compatible way through 1.x releases. Over the
>> past 4 years we haven't actually had major changes to the user-facing API --
>> the only ones were changing the package to org.apache.spark, and upgrading
>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> two things: it tells users that they can be confident that version will be
>> maintained for a long time, which we absolutely want to do, and it lets
>> outsiders see that the project is now fairly mature (for many people,
>> pre-1.0 might still cause them not to try it). I think both are good for
>> the community.
>> 
>> Regarding binary compatibility, I agree that it's what we should strive
>> for, but it just seems premature to codify now. Let's see how it works
>> between, say, 1.0 and 1.1, and then we can codify it.
>> 
>> Matei
>> 
>> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
>> wrote:
>> 
>>> Thanks Patick to initiate the discussion about ne

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Mark Hamstra
Imran:

> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.


And moving master to 1.0.0-SNAPSHOT doesn't preclude that.  If anything, it
turns that "ought to" into "must" -- which is another way of saying what
Reynold said: "The point of 1.0 is for us to self-enforce API compatibility
in the context of longer term support. If we continue down the 0.xx road,
we will always have excuse for breaking APIs."

1.0.0-SNAPSHOT doesn't mean that the API is final right now.  It means that
what is released next will be final over what is intended to be the lengthy
scope of a major release.  That means that adding new features and
functionality (at least to core spark) should be a very low priority for
this development cycle, and establishing the 1.0 API from what is already
in 0.9.0 should be our first priority.  It wouldn't trouble me at all if
not-strictly-necessary new features were left to hang out on the pull
request queue for quite awhile until we are ready to add them in 1.1.0, if
we were to do pretty much nothing else during this cycle except to get the
1.0 API to where most of us agree that it is in good shape.

If we're not adding new features and extending the 0.9.0 API, then there
really is no need for a 0.10.0 minor-release, whose main purpose would be
to collect the API additions from 0.9.0.  Bug-fixes go in 0.9.1-SNAPSHOT;
bug-fixes and finalized 1.0 API go in 1.0.0-SNAPSHOT; almost all new
features are put on hold and wait for 1.1.0-SNAPSHOT.

... it seems possible that there could be new features we'd like to release
> in 0.10...


We certainly can add new features to 1.0.0, but they will have to go
through a rigorous review to be certain that they are things that we really
want to commit to keeping going forward.  But after 1.0, that is true for
any new feature proposal unless we create specifically experimental
branches.  So what moving to 1.0.0-SNAPSHOT really means is that we are
saying that we have gone beyond the development phase where more-or-less
experimental features can be added to Spark releases only to be withdrawn
later -- that time is done after 1.0.0-SNAPSHOT.  Now to be fair,
tentative/experimental features have not been added willy-nilly to Spark
over recent releases, and withdrawal/replacement has been about as limited
in scope as could be fairly expected, so this shouldn't be a radically new
and different development paradigm.  There are, though, some experiments
that were added in the past and should probably now be withdrawn (or at
least deprecated in 1.0.0, withdrawn in 1.1.0.)  I'll put my own
contribution of mapWith, filterWith, et. al on the chopping block as an
effort that, at least in its present form, doesn't provide enough extra
over mapPartitionsWithIndex, and whose syntax is awkward enough that I
don't believe these methods have ever been widely used, so that their
inclusion in the 1.0 API is probably not warranted.

There are other elements of Spark that also should be culled and/or
refactored before 1.0.  Imran has listed a few. I'll also suggest that
there are at least parts of alternative Broadcast variable implementations
that should probably be left behind.  In any event, Imran is absolutely
correct that we need to have a discussion about these issues.  Moving to
1.0.0-SNAPSHOT forces us to begin that discussion.

So, I'm +1 for 1.0.0-incubating-SNAPSHOT (and looking forward to losing the
"incubating"!)




On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid  wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
>
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?
> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)
> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?
>
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
>
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
>
>
>
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Z

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Patrick Wendell
Just to echo others - The relevant question is whether we want to
advertise stable API's for users that we will support for a long time
horizon. And doing this is critical to being taken seriously as a
mature project.

The question is not whether or not there are things we want to improve
about Spark (further reduce dependencies, runtime stability, etc) - of
course everyone wants to improve those things!

In the next few months ahead of 1.0 the plan would be to invest effort
in finishing off loose ends in the API and of course, no 1.0 release
candidate will pass muster if these aren't addressed. I only see a few
fairly small blockers though wrt API issues:

- We should mark things that may evolve and change as semi-private
developer API's (e.g. the Spark Listener).
- We need to standardize the Java API in a way that supports Java 8 lamdbas.

Other than that - I don't see many blockers in terms of API changes we
might want to make. A lot of those were dealt with in 0.9 specifically
to prepare for this.

The broader question API "clean-up" brings up a debate about the trade
off of compatibility with older pre-1.0 versions of Spark. This is not
the primary issue under discussion and can be debated separably.

The primary issue at hand is whether to have 1.0 in ~3 months vs
pushing it to ~6 months from now or more.

- Patrick

On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza  wrote:
> If the APIs are usable, stability and continuity are much more important
> than perfection.  With many already relying on the current APIs, I think
> trying to clean them up will just cause pain for users and integrators.
>  Hadoop made this mistake when they decided the original MapReduce APIs
> were ugly and introduced a new set of APIs to do the same thing.  Even
> though this happened in a pre-1.0 release, three years down the road, both
> the old and new APIs are still supported, causing endless confusion for
> users.  If individual functions or configuration properties have unclear
> names, they can be deprecated and replaced, but redoing the APIs or
> breaking compatibility at this point is simply not worth it.
>
>
> On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid  wrote:
>
>> I don't really agree with this logic.  I think we haven't broken API so far
>> because we just keep adding stuff on to it, and we haven't bothered to
>> clean the api up, specifically to *avoid* breaking things.  Here's a
>> handful of api breaking things that we might want to consider:
>>
>> * should we look at all the various configuration properties, and maybe
>> some of them should get renamed for consistency / clarity?
>> * do all of the functions on RDD need to be in core?  or do some of them
>> that are simple additions built on top of the primitives really belong in a
>> "utils" package or something?  Eg., maybe we should get rid of all the
>> variants of the mapPartitions / mapWith / etc.  just have map, and
>> mapPartitionsWithIndex  (too many choices in the api can also be confusing
>> to the user)
>> * are the right things getting tracked in SparkListener?  Do we need to add
>> or remove anything?
>>
>> This is probably not the right list of questions, that's just an idea of
>> the kind of thing we should be thinking about.
>>
>> Its also fine with me if 1.0 is next, I just think that we ought to be
>> asking these kinds of questions up and down the entire api before we
>> release 1.0.  And given that we haven't even started that discussion, it
>> seems possible that there could be new features we'd like to release in
>> 0.10 before that discussion is finished.
>>
>>
>>
>> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia > >wrote:
>>
>> > I think it's important to do 1.0 next. The project has been around for 4
>> > years, and I'd be comfortable maintaining the current codebase for a long
>> > time in an API and binary compatible way through 1.x releases. Over the
>> > past 4 years we haven't actually had major changes to the user-facing
>> API --
>> > the only ones were changing the package to org.apache.spark, and
>> upgrading
>> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> > two things: it tells users that they can be confident that version will
>> be
>> > maintained for a long time, which we absolutely want to do, and it lets
>> > outsiders see that the project is now fairly mature (for many people,
>> > pre-1.0 might still cause them not to try it). I think both are good for
>> > the community.
>> >
>> > Regarding binary compatibility, I agree that it's what we should strive
>> > for, but it just seems premature to codify now. Let's see how it works
>> > between, say, 1.0 and 1.1, and then we can codify it.
>> >
>> > Matei
>> >
>> > On Feb 6, 2014, at 10:43 AM, Henry Saputra 
>> > wrote:
>> >
>> > > Thanks Patick to initiate the discussion about next road map for Apache
>> > Spark.
>> > >
>> > > I am +1 for 0.10.0 for next version.
>> 

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
If the APIs are usable, stability and continuity are much more important
than perfection.  With many already relying on the current APIs, I think
trying to clean them up will just cause pain for users and integrators.
 Hadoop made this mistake when they decided the original MapReduce APIs
were ugly and introduced a new set of APIs to do the same thing.  Even
though this happened in a pre-1.0 release, three years down the road, both
the old and new APIs are still supported, causing endless confusion for
users.  If individual functions or configuration properties have unclear
names, they can be deprecated and replaced, but redoing the APIs or
breaking compatibility at this point is simply not worth it.


On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid  wrote:

> I don't really agree with this logic.  I think we haven't broken API so far
> because we just keep adding stuff on to it, and we haven't bothered to
> clean the api up, specifically to *avoid* breaking things.  Here's a
> handful of api breaking things that we might want to consider:
>
> * should we look at all the various configuration properties, and maybe
> some of them should get renamed for consistency / clarity?
> * do all of the functions on RDD need to be in core?  or do some of them
> that are simple additions built on top of the primitives really belong in a
> "utils" package or something?  Eg., maybe we should get rid of all the
> variants of the mapPartitions / mapWith / etc.  just have map, and
> mapPartitionsWithIndex  (too many choices in the api can also be confusing
> to the user)
> * are the right things getting tracked in SparkListener?  Do we need to add
> or remove anything?
>
> This is probably not the right list of questions, that's just an idea of
> the kind of thing we should be thinking about.
>
> Its also fine with me if 1.0 is next, I just think that we ought to be
> asking these kinds of questions up and down the entire api before we
> release 1.0.  And given that we haven't even started that discussion, it
> seems possible that there could be new features we'd like to release in
> 0.10 before that discussion is finished.
>
>
>
> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia  >wrote:
>
> > I think it's important to do 1.0 next. The project has been around for 4
> > years, and I'd be comfortable maintaining the current codebase for a long
> > time in an API and binary compatible way through 1.x releases. Over the
> > past 4 years we haven't actually had major changes to the user-facing
> API --
> > the only ones were changing the package to org.apache.spark, and
> upgrading
> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> > two things: it tells users that they can be confident that version will
> be
> > maintained for a long time, which we absolutely want to do, and it lets
> > outsiders see that the project is now fairly mature (for many people,
> > pre-1.0 might still cause them not to try it). I think both are good for
> > the community.
> >
> > Regarding binary compatibility, I agree that it's what we should strive
> > for, but it just seems premature to codify now. Let's see how it works
> > between, say, 1.0 and 1.1, and then we can codify it.
> >
> > Matei
> >
> > On Feb 6, 2014, at 10:43 AM, Henry Saputra 
> > wrote:
> >
> > > Thanks Patick to initiate the discussion about next road map for Apache
> > Spark.
> > >
> > > I am +1 for 0.10.0 for next version.
> > >
> > > It will give us as community some time to digest the process and the
> > > vision and make adjustment accordingly.
> > >
> > > Release a 1.0.0 is a huge milestone and if we do need to break API
> > > somehow or modify internal behavior dramatically we could take
> > > advantage to release 1.0.0 as good step to go to.
> > >
> > >
> > > - Henry
> > >
> > >
> > >
> > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
> wrote:
> > >> Agree on timeboxed releases as well.
> > >>
> > >> Is there a vision for where we want to be as a project before
> declaring
> > the
> > >> first 1.0 release?  While we're in the 0.x days per semver we can
> break
> > >> backcompat at will (though we try to avoid it where possible), and
> that
> > >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> > >> because it seems to follow after 0.9 rather than making an intentional
> > >> decision that we're at the point where we can stand by the current
> APIs
> > and
> > >> binary compatibility for the next year or so of the major release.
> > >>
> > >> Until that decision is made as a group I'd rather we do an immediate
> > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> > later,
> > >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> > >> but not the other way around.
> > >>
> > >> https://github.com/apache/incubator-spark/pull/542
> > >>
> > >> Cheers!
> > >> Andrew
> > >>
> > >>
> > >> On Wed, Feb 5, 2014 at

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Imran Rashid
I don't really agree with this logic.  I think we haven't broken API so far
because we just keep adding stuff on to it, and we haven't bothered to
clean the api up, specifically to *avoid* breaking things.  Here's a
handful of api breaking things that we might want to consider:

* should we look at all the various configuration properties, and maybe
some of them should get renamed for consistency / clarity?
* do all of the functions on RDD need to be in core?  or do some of them
that are simple additions built on top of the primitives really belong in a
"utils" package or something?  Eg., maybe we should get rid of all the
variants of the mapPartitions / mapWith / etc.  just have map, and
mapPartitionsWithIndex  (too many choices in the api can also be confusing
to the user)
* are the right things getting tracked in SparkListener?  Do we need to add
or remove anything?

This is probably not the right list of questions, that's just an idea of
the kind of thing we should be thinking about.

Its also fine with me if 1.0 is next, I just think that we ought to be
asking these kinds of questions up and down the entire api before we
release 1.0.  And given that we haven't even started that discussion, it
seems possible that there could be new features we'd like to release in
0.10 before that discussion is finished.



On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia wrote:

> I think it's important to do 1.0 next. The project has been around for 4
> years, and I'd be comfortable maintaining the current codebase for a long
> time in an API and binary compatible way through 1.x releases. Over the
> past 4 years we haven't actually had major changes to the user-facing API --
> the only ones were changing the package to org.apache.spark, and upgrading
> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> two things: it tells users that they can be confident that version will be
> maintained for a long time, which we absolutely want to do, and it lets
> outsiders see that the project is now fairly mature (for many people,
> pre-1.0 might still cause them not to try it). I think both are good for
> the community.
>
> Regarding binary compatibility, I agree that it's what we should strive
> for, but it just seems premature to codify now. Let's see how it works
> between, say, 1.0 and 1.1, and then we can codify it.
>
> Matei
>
> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
> wrote:
>
> > Thanks Patick to initiate the discussion about next road map for Apache
> Spark.
> >
> > I am +1 for 0.10.0 for next version.
> >
> > It will give us as community some time to digest the process and the
> > vision and make adjustment accordingly.
> >
> > Release a 1.0.0 is a huge milestone and if we do need to break API
> > somehow or modify internal behavior dramatically we could take
> > advantage to release 1.0.0 as good step to go to.
> >
> >
> > - Henry
> >
> >
> >
> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash  wrote:
> >> Agree on timeboxed releases as well.
> >>
> >> Is there a vision for where we want to be as a project before declaring
> the
> >> first 1.0 release?  While we're in the 0.x days per semver we can break
> >> backcompat at will (though we try to avoid it where possible), and that
> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >> because it seems to follow after 0.9 rather than making an intentional
> >> decision that we're at the point where we can stand by the current APIs
> and
> >> binary compatibility for the next year or so of the major release.
> >>
> >> Until that decision is made as a group I'd rather we do an immediate
> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> >> but not the other way around.
> >>
> >> https://github.com/apache/incubator-spark/pull/542
> >>
> >> Cheers!
> >> Andrew
> >>
> >>
> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun  >wrote:
> >>
> >>> +1 on time boxed releases and compatibility guidelines
> >>>
> >>>
>  Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
> 
>  Hi Everyone,
> 
>  In an effort to coordinate development amongst the growing list of
>  Spark contributors, I've taken some time to write up a proposal to
>  formalize various pieces of the development process. The next release
>  of Spark will likely be Spark 1.0.0, so this message is intended in
>  part to coordinate the release plan for 1.0.0 and future releases.
>  I'll post this on the wiki after discussing it on this thread as
>  tentative project guidelines.
> 
>  == Spark Release Structure ==
>  Starting with Spark 1.0.0, the Spark project will follow the semantic
>  versioning guidelines (http://semver.org/) with a few deviations.
>  These small differences account for Spark's nature as a multi-module
>  project.

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Reynold Xin
+1 for 1.0


The point of 1.0 is for us to self-enforce API compatibility in the context
of longer term support. If we continue down the 0.xx road, we will always
have excuse for breaking APIs. That said, a major focus of 0.9 and some of
the work that are happening for 1.0 (e.g. configuration, Java 8 closure
support, security) are for better API compatibility support in 1.x releases.

While not perfect, Spark as is is already more mature than many (ASF)
projects that are versioned 1.x, 2.x, or even 10.x. Software releases are
always a moving target. 1.0 doesn't mean it is "perfect" and "final". The
project will still evolve.




On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan  wrote:

> +1 for 0.10.0.
>
> It would give more time to study things (such as the new SparkConf)
> and let the community decide if any breaking API changes are needed.
>
> Also, a +1 for minor revisions not breaking code compatibility,
> including Scala versions.   (I guess this would mean that 1.x would
> stay on Scala 2.10.x)
>
> On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza 
> wrote:
> > Bleh, hit send to early again.  My second paragraph was to argue for
> 1.0.0
> > instead of 0.10.0, not to hammer on the binary compatibility point.
> >
> >
> > On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza 
> wrote:
> >
> >> *Would it make sense to put in something that strongly discourages
> binary
> >> incompatible changes when possible?
> >>
> >>
> >> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza  >wrote:
> >>
> >>> Not codifying binary compatibility as a hard rule sounds fine to me.
> >>>  Would it make sense to put something in that . I.e. avoid making
> needless
> >>> changes to class hierarchies.
> >>>
> >>> Whether Spark considers itself stable or not, users are beginning to
> >>> treat it so.  A responsible project will acknowledge this and provide
> the
> >>> stability needed by its user base.  I think some projects have made the
> >>> mistake of waiting too long to release a 1.0.0.  It allows them to put
> off
> >>> making the hard decisions, but users and downstream projects suffer.
> >>>
> >>> If Spark needs to go through dramatic changes, there's always the
> option
> >>> of a 2.0.0 that allows for this.
> >>>
> >>> -Sandy
> >>>
> >>>
> >>>
> >>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia <
> matei.zaha...@gmail.com>wrote:
> >>>
>  I think it's important to do 1.0 next. The project has been around
> for 4
>  years, and I'd be comfortable maintaining the current codebase for a
> long
>  time in an API and binary compatible way through 1.x releases. Over
> the
>  past 4 years we haven't actually had major changes to the user-facing
> API --
>  the only ones were changing the package to org.apache.spark, and
> upgrading
>  the Scala version. I'd be okay leaving 1.x to always use Scala 2.10
> for
>  example, or later cross-building it for Scala 2.11. Updating to 1.0
> says
>  two things: it tells users that they can be confident that version
> will be
>  maintained for a long time, which we absolutely want to do, and it
> lets
>  outsiders see that the project is now fairly mature (for many people,
>  pre-1.0 might still cause them not to try it). I think both are good
> for
>  the community.
> 
>  Regarding binary compatibility, I agree that it's what we should
> strive
>  for, but it just seems premature to codify now. Let's see how it works
>  between, say, 1.0 and 1.1, and then we can codify it.
> 
>  Matei
> 
>  On Feb 6, 2014, at 10:43 AM, Henry Saputra 
>  wrote:
> 
>  > Thanks Patick to initiate the discussion about next road map for
>  Apache Spark.
>  >
>  > I am +1 for 0.10.0 for next version.
>  >
>  > It will give us as community some time to digest the process and the
>  > vision and make adjustment accordingly.
>  >
>  > Release a 1.0.0 is a huge milestone and if we do need to break API
>  > somehow or modify internal behavior dramatically we could take
>  > advantage to release 1.0.0 as good step to go to.
>  >
>  >
>  > - Henry
>  >
>  >
>  >
>  > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
>  wrote:
>  >> Agree on timeboxed releases as well.
>  >>
>  >> Is there a vision for where we want to be as a project before
>  declaring the
>  >> first 1.0 release?  While we're in the 0.x days per semver we can
>  break
>  >> backcompat at will (though we try to avoid it where possible), and
>  that
>  >> luxury goes away with 1.x  I just don't want to release a 1.0
> simply
>  >> because it seems to follow after 0.9 rather than making an
> intentional
>  >> decision that we're at the point where we can stand by the current
>  APIs and
>  >> binary compatibility for the next year or so of the major release.
>  >>
>  >> Until that decision is made as a group I'd rather we do an
> immediate
>  >> version bump to 0

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Matei Zaharia

On Feb 6, 2014, at 11:56 AM, Evan Chan  wrote:

> The other reason for waiting are things like stability.
> 
> It would be great to have as a goal for 1.0.0 that under most heavy
> use scenarios, workers and executors don't just die, which is not true
> today.
> Also, there should be minimal "silent failures" which are difficult to debug.
> 

I think this is orthogonal to the version number. 1.x versions can have bugs — 
it’s almost unavoidable in the distributed system space. The version number is 
more about the level of compatibility and support people can expect, which I 
think is something we want to solidify. Calling it 1.x will also make it more 
likely that we have long-term maintenance releases, because with the current 
project, people expect that they have to keep jumping to the latest version. 
Just as an example, when we did a survey a while back, out of ~100 respondents, 
all were either on the very latest release or on master (!). I’ve had multiple 
people ask me about longer-term supported versions (e.g. if I download 1.x now, 
will it still have maintenance releases a year from now, or will it be left in 
the dust).

Matei



Re: Proposal for Spark Release Strategy

2014-02-06 Thread Matei Zaharia
On Feb 6, 2014, at 11:04 AM, Sandy Ryza  wrote:

> *Would it make sense to put in something that strongly discourages binary
> incompatible changes when possible?

Yes, I like this idea. Let’s just say we’ll strive for this as much as possible 
and think about codifying it after some experience doing this.

Matei




Re: Proposal for Spark Release Strategy

2014-02-06 Thread Evan Chan
The other reason for waiting are things like stability.

It would be great to have as a goal for 1.0.0 that under most heavy
use scenarios, workers and executors don't just die, which is not true
today.
Also, there should be minimal "silent failures" which are difficult to debug.

On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan  wrote:
> +1 for 0.10.0.
>
> It would give more time to study things (such as the new SparkConf)
> and let the community decide if any breaking API changes are needed.
>
> Also, a +1 for minor revisions not breaking code compatibility,
> including Scala versions.   (I guess this would mean that 1.x would
> stay on Scala 2.10.x)
>
> On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza  wrote:
>> Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
>> instead of 0.10.0, not to hammer on the binary compatibility point.
>>
>>
>> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza  wrote:
>>
>>> *Would it make sense to put in something that strongly discourages binary
>>> incompatible changes when possible?
>>>
>>>
>>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote:
>>>
 Not codifying binary compatibility as a hard rule sounds fine to me.
  Would it make sense to put something in that . I.e. avoid making needless
 changes to class hierarchies.

 Whether Spark considers itself stable or not, users are beginning to
 treat it so.  A responsible project will acknowledge this and provide the
 stability needed by its user base.  I think some projects have made the
 mistake of waiting too long to release a 1.0.0.  It allows them to put off
 making the hard decisions, but users and downstream projects suffer.

 If Spark needs to go through dramatic changes, there's always the option
 of a 2.0.0 that allows for this.

 -Sandy



 On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia 
 wrote:

> I think it's important to do 1.0 next. The project has been around for 4
> years, and I'd be comfortable maintaining the current codebase for a long
> time in an API and binary compatible way through 1.x releases. Over the
> past 4 years we haven't actually had major changes to the user-facing API 
> --
> the only ones were changing the package to org.apache.spark, and upgrading
> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> two things: it tells users that they can be confident that version will be
> maintained for a long time, which we absolutely want to do, and it lets
> outsiders see that the project is now fairly mature (for many people,
> pre-1.0 might still cause them not to try it). I think both are good for
> the community.
>
> Regarding binary compatibility, I agree that it's what we should strive
> for, but it just seems premature to codify now. Let's see how it works
> between, say, 1.0 and 1.1, and then we can codify it.
>
> Matei
>
> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
> wrote:
>
> > Thanks Patick to initiate the discussion about next road map for
> Apache Spark.
> >
> > I am +1 for 0.10.0 for next version.
> >
> > It will give us as community some time to digest the process and the
> > vision and make adjustment accordingly.
> >
> > Release a 1.0.0 is a huge milestone and if we do need to break API
> > somehow or modify internal behavior dramatically we could take
> > advantage to release 1.0.0 as good step to go to.
> >
> >
> > - Henry
> >
> >
> >
> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
> wrote:
> >> Agree on timeboxed releases as well.
> >>
> >> Is there a vision for where we want to be as a project before
> declaring the
> >> first 1.0 release?  While we're in the 0.x days per semver we can
> break
> >> backcompat at will (though we try to avoid it where possible), and
> that
> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >> because it seems to follow after 0.9 rather than making an intentional
> >> decision that we're at the point where we can stand by the current
> APIs and
> >> binary compatibility for the next year or so of the major release.
> >>
> >> Until that decision is made as a group I'd rather we do an immediate
> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> >> but not the other way around.
> >>
> >> https://github.com/apache/incubator-spark/pull/542
> >>
> >> Cheers!
> >> Andrew
> >>
> >>
> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun  >wrote:
> >>
> >>> +1 on time boxed releases and compatibility guidelines
> >>>
> >>>
>  Am 06.02.2014 u

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Evan Chan
+1 for 0.10.0.

It would give more time to study things (such as the new SparkConf)
and let the community decide if any breaking API changes are needed.

Also, a +1 for minor revisions not breaking code compatibility,
including Scala versions.   (I guess this would mean that 1.x would
stay on Scala 2.10.x)

On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza  wrote:
> Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
> instead of 0.10.0, not to hammer on the binary compatibility point.
>
>
> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza  wrote:
>
>> *Would it make sense to put in something that strongly discourages binary
>> incompatible changes when possible?
>>
>>
>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote:
>>
>>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>>  Would it make sense to put something in that . I.e. avoid making needless
>>> changes to class hierarchies.
>>>
>>> Whether Spark considers itself stable or not, users are beginning to
>>> treat it so.  A responsible project will acknowledge this and provide the
>>> stability needed by its user base.  I think some projects have made the
>>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>>> making the hard decisions, but users and downstream projects suffer.
>>>
>>> If Spark needs to go through dramatic changes, there's always the option
>>> of a 2.0.0 that allows for this.
>>>
>>> -Sandy
>>>
>>>
>>>
>>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia 
>>> wrote:
>>>
 I think it's important to do 1.0 next. The project has been around for 4
 years, and I'd be comfortable maintaining the current codebase for a long
 time in an API and binary compatible way through 1.x releases. Over the
 past 4 years we haven't actually had major changes to the user-facing API 
 --
 the only ones were changing the package to org.apache.spark, and upgrading
 the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
 example, or later cross-building it for Scala 2.11. Updating to 1.0 says
 two things: it tells users that they can be confident that version will be
 maintained for a long time, which we absolutely want to do, and it lets
 outsiders see that the project is now fairly mature (for many people,
 pre-1.0 might still cause them not to try it). I think both are good for
 the community.

 Regarding binary compatibility, I agree that it's what we should strive
 for, but it just seems premature to codify now. Let's see how it works
 between, say, 1.0 and 1.1, and then we can codify it.

 Matei

 On Feb 6, 2014, at 10:43 AM, Henry Saputra 
 wrote:

 > Thanks Patick to initiate the discussion about next road map for
 Apache Spark.
 >
 > I am +1 for 0.10.0 for next version.
 >
 > It will give us as community some time to digest the process and the
 > vision and make adjustment accordingly.
 >
 > Release a 1.0.0 is a huge milestone and if we do need to break API
 > somehow or modify internal behavior dramatically we could take
 > advantage to release 1.0.0 as good step to go to.
 >
 >
 > - Henry
 >
 >
 >
 > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
 wrote:
 >> Agree on timeboxed releases as well.
 >>
 >> Is there a vision for where we want to be as a project before
 declaring the
 >> first 1.0 release?  While we're in the 0.x days per semver we can
 break
 >> backcompat at will (though we try to avoid it where possible), and
 that
 >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
 >> because it seems to follow after 0.9 rather than making an intentional
 >> decision that we're at the point where we can stand by the current
 APIs and
 >> binary compatibility for the next year or so of the major release.
 >>
 >> Until that decision is made as a group I'd rather we do an immediate
 >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
 later,
 >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
 1.0
 >> but not the other way around.
 >>
 >> https://github.com/apache/incubator-spark/pull/542
 >>
 >> Cheers!
 >> Andrew
 >>
 >>
 >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >>> >wrote:
 >>
 >>> +1 on time boxed releases and compatibility guidelines
 >>>
 >>>
  Am 06.02.2014 um 01:20 schrieb Patrick Wendell >>> >:
 
  Hi Everyone,
 
  In an effort to coordinate development amongst the growing list of
  Spark contributors, I've taken some time to write up a proposal to
  formalize various pieces of the development process. The next
 release
  of Spark will likely be Spark 1.0.0, so this message is intended in
  part to coordinate the release plan for 1.0.0 and

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Not codifying binary compatibility as a hard rule sounds fine to me.  Would
it make sense to put something in that . I.e. avoid making needless changes
to class hierarchies.

Whether Spark considers itself stable or not, users are beginning to treat
it so.  A responsible project will acknowledge this and provide the
stability needed by its user base.  I think some projects have made the
mistake of waiting too long to release a 1.0.0.  It allows them to put off
making the hard decisions, but users and downstream projects suffer.

If Spark needs to go through dramatic changes, there's always the option of
a 2.0.0 that allows for this.

-Sandy



On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia wrote:

> I think it's important to do 1.0 next. The project has been around for 4
> years, and I'd be comfortable maintaining the current codebase for a long
> time in an API and binary compatible way through 1.x releases. Over the
> past 4 years we haven't actually had major changes to the user-facing API --
> the only ones were changing the package to org.apache.spark, and upgrading
> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
> two things: it tells users that they can be confident that version will be
> maintained for a long time, which we absolutely want to do, and it lets
> outsiders see that the project is now fairly mature (for many people,
> pre-1.0 might still cause them not to try it). I think both are good for
> the community.
>
> Regarding binary compatibility, I agree that it's what we should strive
> for, but it just seems premature to codify now. Let's see how it works
> between, say, 1.0 and 1.1, and then we can codify it.
>
> Matei
>
> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
> wrote:
>
> > Thanks Patick to initiate the discussion about next road map for Apache
> Spark.
> >
> > I am +1 for 0.10.0 for next version.
> >
> > It will give us as community some time to digest the process and the
> > vision and make adjustment accordingly.
> >
> > Release a 1.0.0 is a huge milestone and if we do need to break API
> > somehow or modify internal behavior dramatically we could take
> > advantage to release 1.0.0 as good step to go to.
> >
> >
> > - Henry
> >
> >
> >
> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash  wrote:
> >> Agree on timeboxed releases as well.
> >>
> >> Is there a vision for where we want to be as a project before declaring
> the
> >> first 1.0 release?  While we're in the 0.x days per semver we can break
> >> backcompat at will (though we try to avoid it where possible), and that
> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >> because it seems to follow after 0.9 rather than making an intentional
> >> decision that we're at the point where we can stand by the current APIs
> and
> >> binary compatibility for the next year or so of the major release.
> >>
> >> Until that decision is made as a group I'd rather we do an immediate
> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> >> but not the other way around.
> >>
> >> https://github.com/apache/incubator-spark/pull/542
> >>
> >> Cheers!
> >> Andrew
> >>
> >>
> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun  >wrote:
> >>
> >>> +1 on time boxed releases and compatibility guidelines
> >>>
> >>>
>  Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
> 
>  Hi Everyone,
> 
>  In an effort to coordinate development amongst the growing list of
>  Spark contributors, I've taken some time to write up a proposal to
>  formalize various pieces of the development process. The next release
>  of Spark will likely be Spark 1.0.0, so this message is intended in
>  part to coordinate the release plan for 1.0.0 and future releases.
>  I'll post this on the wiki after discussing it on this thread as
>  tentative project guidelines.
> 
>  == Spark Release Structure ==
>  Starting with Spark 1.0.0, the Spark project will follow the semantic
>  versioning guidelines (http://semver.org/) with a few deviations.
>  These small differences account for Spark's nature as a multi-module
>  project.
> 
>  Each Spark release will be versioned:
>  [MAJOR].[MINOR].[MAINTENANCE]
> 
>  All releases with the same major version number will have API
>  compatibility, defined as [1]. Major version numbers will remain
>  stable over long periods of time. For instance, 1.X.Y may last 1 year
>  or more.
> 
>  Minor releases will typically contain new features and improvements.
>  The target frequency for minor releases is every 3-4 months. One
>  change we'd like to make is to announce fixed release dates and merge
>  windows for each release, to facilitate coordination. Each minor
>  release will have a merge window where n

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
*Would it make sense to put in something that strongly discourages binary
incompatible changes when possible?


On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza  wrote:

> Not codifying binary compatibility as a hard rule sounds fine to me.
>  Would it make sense to put something in that . I.e. avoid making needless
> changes to class hierarchies.
>
> Whether Spark considers itself stable or not, users are beginning to treat
> it so.  A responsible project will acknowledge this and provide the
> stability needed by its user base.  I think some projects have made the
> mistake of waiting too long to release a 1.0.0.  It allows them to put off
> making the hard decisions, but users and downstream projects suffer.
>
> If Spark needs to go through dramatic changes, there's always the option
> of a 2.0.0 that allows for this.
>
> -Sandy
>
>
>
> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia wrote:
>
>> I think it's important to do 1.0 next. The project has been around for 4
>> years, and I'd be comfortable maintaining the current codebase for a long
>> time in an API and binary compatible way through 1.x releases. Over the
>> past 4 years we haven't actually had major changes to the user-facing API --
>> the only ones were changing the package to org.apache.spark, and upgrading
>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>> two things: it tells users that they can be confident that version will be
>> maintained for a long time, which we absolutely want to do, and it lets
>> outsiders see that the project is now fairly mature (for many people,
>> pre-1.0 might still cause them not to try it). I think both are good for
>> the community.
>>
>> Regarding binary compatibility, I agree that it's what we should strive
>> for, but it just seems premature to codify now. Let's see how it works
>> between, say, 1.0 and 1.1, and then we can codify it.
>>
>> Matei
>>
>> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
>> wrote:
>>
>> > Thanks Patick to initiate the discussion about next road map for Apache
>> Spark.
>> >
>> > I am +1 for 0.10.0 for next version.
>> >
>> > It will give us as community some time to digest the process and the
>> > vision and make adjustment accordingly.
>> >
>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>> > somehow or modify internal behavior dramatically we could take
>> > advantage to release 1.0.0 as good step to go to.
>> >
>> >
>> > - Henry
>> >
>> >
>> >
>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
>> wrote:
>> >> Agree on timeboxed releases as well.
>> >>
>> >> Is there a vision for where we want to be as a project before
>> declaring the
>> >> first 1.0 release?  While we're in the 0.x days per semver we can break
>> >> backcompat at will (though we try to avoid it where possible), and that
>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> >> because it seems to follow after 0.9 rather than making an intentional
>> >> decision that we're at the point where we can stand by the current
>> APIs and
>> >> binary compatibility for the next year or so of the major release.
>> >>
>> >> Until that decision is made as a group I'd rather we do an immediate
>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>> later,
>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>> 1.0
>> >> but not the other way around.
>> >>
>> >> https://github.com/apache/incubator-spark/pull/542
>> >>
>> >> Cheers!
>> >> Andrew
>> >>
>> >>
>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote:
>> >>
>> >>> +1 on time boxed releases and compatibility guidelines
>> >>>
>> >>>
>>  Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
>> 
>>  Hi Everyone,
>> 
>>  In an effort to coordinate development amongst the growing list of
>>  Spark contributors, I've taken some time to write up a proposal to
>>  formalize various pieces of the development process. The next release
>>  of Spark will likely be Spark 1.0.0, so this message is intended in
>>  part to coordinate the release plan for 1.0.0 and future releases.
>>  I'll post this on the wiki after discussing it on this thread as
>>  tentative project guidelines.
>> 
>>  == Spark Release Structure ==
>>  Starting with Spark 1.0.0, the Spark project will follow the semantic
>>  versioning guidelines (http://semver.org/) with a few deviations.
>>  These small differences account for Spark's nature as a multi-module
>>  project.
>> 
>>  Each Spark release will be versioned:
>>  [MAJOR].[MINOR].[MAINTENANCE]
>> 
>>  All releases with the same major version number will have API
>>  compatibility, defined as [1]. Major version numbers will remain
>>  stable over long periods of time. For instance, 1.X.Y may last 1 year
>>  or more.
>> 
>>  Minor releases will typically contain 

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Bleh, hit send to early again.  My second paragraph was to argue for 1.0.0
instead of 0.10.0, not to hammer on the binary compatibility point.


On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza  wrote:

> *Would it make sense to put in something that strongly discourages binary
> incompatible changes when possible?
>
>
> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote:
>
>> Not codifying binary compatibility as a hard rule sounds fine to me.
>>  Would it make sense to put something in that . I.e. avoid making needless
>> changes to class hierarchies.
>>
>> Whether Spark considers itself stable or not, users are beginning to
>> treat it so.  A responsible project will acknowledge this and provide the
>> stability needed by its user base.  I think some projects have made the
>> mistake of waiting too long to release a 1.0.0.  It allows them to put off
>> making the hard decisions, but users and downstream projects suffer.
>>
>> If Spark needs to go through dramatic changes, there's always the option
>> of a 2.0.0 that allows for this.
>>
>> -Sandy
>>
>>
>>
>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia 
>> wrote:
>>
>>> I think it's important to do 1.0 next. The project has been around for 4
>>> years, and I'd be comfortable maintaining the current codebase for a long
>>> time in an API and binary compatible way through 1.x releases. Over the
>>> past 4 years we haven't actually had major changes to the user-facing API --
>>> the only ones were changing the package to org.apache.spark, and upgrading
>>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for
>>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says
>>> two things: it tells users that they can be confident that version will be
>>> maintained for a long time, which we absolutely want to do, and it lets
>>> outsiders see that the project is now fairly mature (for many people,
>>> pre-1.0 might still cause them not to try it). I think both are good for
>>> the community.
>>>
>>> Regarding binary compatibility, I agree that it's what we should strive
>>> for, but it just seems premature to codify now. Let's see how it works
>>> between, say, 1.0 and 1.1, and then we can codify it.
>>>
>>> Matei
>>>
>>> On Feb 6, 2014, at 10:43 AM, Henry Saputra 
>>> wrote:
>>>
>>> > Thanks Patick to initiate the discussion about next road map for
>>> Apache Spark.
>>> >
>>> > I am +1 for 0.10.0 for next version.
>>> >
>>> > It will give us as community some time to digest the process and the
>>> > vision and make adjustment accordingly.
>>> >
>>> > Release a 1.0.0 is a huge milestone and if we do need to break API
>>> > somehow or modify internal behavior dramatically we could take
>>> > advantage to release 1.0.0 as good step to go to.
>>> >
>>> >
>>> > - Henry
>>> >
>>> >
>>> >
>>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash 
>>> wrote:
>>> >> Agree on timeboxed releases as well.
>>> >>
>>> >> Is there a vision for where we want to be as a project before
>>> declaring the
>>> >> first 1.0 release?  While we're in the 0.x days per semver we can
>>> break
>>> >> backcompat at will (though we try to avoid it where possible), and
>>> that
>>> >> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>> >> because it seems to follow after 0.9 rather than making an intentional
>>> >> decision that we're at the point where we can stand by the current
>>> APIs and
>>> >> binary compatibility for the next year or so of the major release.
>>> >>
>>> >> Until that decision is made as a group I'd rather we do an immediate
>>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
>>> later,
>>> >> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
>>> 1.0
>>> >> but not the other way around.
>>> >>
>>> >> https://github.com/apache/incubator-spark/pull/542
>>> >>
>>> >> Cheers!
>>> >> Andrew
>>> >>
>>> >>
>>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >> >wrote:
>>> >>
>>> >>> +1 on time boxed releases and compatibility guidelines
>>> >>>
>>> >>>
>>>  Am 06.02.2014 um 01:20 schrieb Patrick Wendell >> >:
>>> 
>>>  Hi Everyone,
>>> 
>>>  In an effort to coordinate development amongst the growing list of
>>>  Spark contributors, I've taken some time to write up a proposal to
>>>  formalize various pieces of the development process. The next
>>> release
>>>  of Spark will likely be Spark 1.0.0, so this message is intended in
>>>  part to coordinate the release plan for 1.0.0 and future releases.
>>>  I'll post this on the wiki after discussing it on this thread as
>>>  tentative project guidelines.
>>> 
>>>  == Spark Release Structure ==
>>>  Starting with Spark 1.0.0, the Spark project will follow the
>>> semantic
>>>  versioning guidelines (http://semver.org/) with a few deviations.
>>>  These small differences account for Spark's nature as a multi-module
>>>  project.
>>> 
>>>  Each Spark release will be versioned

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Matei Zaharia
I think it’s important to do 1.0 next. The project has been around for 4 years, 
and I’d be comfortable maintaining the current codebase for a long time in an 
API and binary compatible way through 1.x releases. Over the past 4 years we 
haven’t actually had major changes to the user-facing API — the only ones were 
changing the package to org.apache.spark, and upgrading the Scala version. I’d 
be okay leaving 1.x to always use Scala 2.10 for example, or later 
cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells 
users that they can be confident that version will be maintained for a long 
time, which we absolutely want to do, and it lets outsiders see that the 
project is now fairly mature (for many people, pre-1.0 might still cause them 
not to try it). I think both are good for the community.

Regarding binary compatibility, I agree that it’s what we should strive for, 
but it just seems premature to codify now. Let’s see how it works between, say, 
1.0 and 1.1, and then we can codify it.

Matei

On Feb 6, 2014, at 10:43 AM, Henry Saputra  wrote:

> Thanks Patick to initiate the discussion about next road map for Apache Spark.
> 
> I am +1 for 0.10.0 for next version.
> 
> It will give us as community some time to digest the process and the
> vision and make adjustment accordingly.
> 
> Release a 1.0.0 is a huge milestone and if we do need to break API
> somehow or modify internal behavior dramatically we could take
> advantage to release 1.0.0 as good step to go to.
> 
> 
> - Henry
> 
> 
> 
> On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash  wrote:
>> Agree on timeboxed releases as well.
>> 
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>> 
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>> 
>> https://github.com/apache/incubator-spark/pull/542
>> 
>> Cheers!
>> Andrew
>> 
>> 
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote:
>> 
>>> +1 on time boxed releases and compatibility guidelines
>>> 
>>> 
 Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
 
 Hi Everyone,
 
 In an effort to coordinate development amongst the growing list of
 Spark contributors, I've taken some time to write up a proposal to
 formalize various pieces of the development process. The next release
 of Spark will likely be Spark 1.0.0, so this message is intended in
 part to coordinate the release plan for 1.0.0 and future releases.
 I'll post this on the wiki after discussing it on this thread as
 tentative project guidelines.
 
 == Spark Release Structure ==
 Starting with Spark 1.0.0, the Spark project will follow the semantic
 versioning guidelines (http://semver.org/) with a few deviations.
 These small differences account for Spark's nature as a multi-module
 project.
 
 Each Spark release will be versioned:
 [MAJOR].[MINOR].[MAINTENANCE]
 
 All releases with the same major version number will have API
 compatibility, defined as [1]. Major version numbers will remain
 stable over long periods of time. For instance, 1.X.Y may last 1 year
 or more.
 
 Minor releases will typically contain new features and improvements.
 The target frequency for minor releases is every 3-4 months. One
 change we'd like to make is to announce fixed release dates and merge
 windows for each release, to facilitate coordination. Each minor
 release will have a merge window where new patches can be merged, a QA
 window when only fixes can be merged, then a final period where voting
 occurs on release candidates. These windows will be announced
 immediately after the previous minor release to give people plenty of
 time, and over time, we might make the whole release process more
 regular (similar to Ubuntu). At the bottom of this document is an
 example window for the 1.0.0 release.
 
 Maintenance releases will occur more frequently and depend on specific
 patches introduced (e.g. bug fixes) and their urgency. In general
 these releases are designed to patch bugs. However, higher level
 libraries may introduce small features, such as a new algorithm,
 provided they are entirely additive and isolated from existing code
 paths. Spark core may not introduce any features.
 
 When 

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Henry Saputra
Thanks Patick to initiate the discussion about next road map for Apache Spark.

I am +1 for 0.10.0 for next version.

It will give us as community some time to digest the process and the
vision and make adjustment accordingly.

Release a 1.0.0 is a huge milestone and if we do need to break API
somehow or modify internal behavior dramatically we could take
advantage to release 1.0.0 as good step to go to.


- Henry



On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash  wrote:
> Agree on timeboxed releases as well.
>
> Is there a vision for where we want to be as a project before declaring the
> first 1.0 release?  While we're in the 0.x days per semver we can break
> backcompat at will (though we try to avoid it where possible), and that
> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> because it seems to follow after 0.9 rather than making an intentional
> decision that we're at the point where we can stand by the current APIs and
> binary compatibility for the next year or so of the major release.
>
> Until that decision is made as a group I'd rather we do an immediate
> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> but not the other way around.
>
> https://github.com/apache/incubator-spark/pull/542
>
> Cheers!
> Andrew
>
>
> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote:
>
>> +1 on time boxed releases and compatibility guidelines
>>
>>
>> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
>> >
>> > Hi Everyone,
>> >
>> > In an effort to coordinate development amongst the growing list of
>> > Spark contributors, I've taken some time to write up a proposal to
>> > formalize various pieces of the development process. The next release
>> > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > part to coordinate the release plan for 1.0.0 and future releases.
>> > I'll post this on the wiki after discussing it on this thread as
>> > tentative project guidelines.
>> >
>> > == Spark Release Structure ==
>> > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > versioning guidelines (http://semver.org/) with a few deviations.
>> > These small differences account for Spark's nature as a multi-module
>> > project.
>> >
>> > Each Spark release will be versioned:
>> > [MAJOR].[MINOR].[MAINTENANCE]
>> >
>> > All releases with the same major version number will have API
>> > compatibility, defined as [1]. Major version numbers will remain
>> > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > or more.
>> >
>> > Minor releases will typically contain new features and improvements.
>> > The target frequency for minor releases is every 3-4 months. One
>> > change we'd like to make is to announce fixed release dates and merge
>> > windows for each release, to facilitate coordination. Each minor
>> > release will have a merge window where new patches can be merged, a QA
>> > window when only fixes can be merged, then a final period where voting
>> > occurs on release candidates. These windows will be announced
>> > immediately after the previous minor release to give people plenty of
>> > time, and over time, we might make the whole release process more
>> > regular (similar to Ubuntu). At the bottom of this document is an
>> > example window for the 1.0.0 release.
>> >
>> > Maintenance releases will occur more frequently and depend on specific
>> > patches introduced (e.g. bug fixes) and their urgency. In general
>> > these releases are designed to patch bugs. However, higher level
>> > libraries may introduce small features, such as a new algorithm,
>> > provided they are entirely additive and isolated from existing code
>> > paths. Spark core may not introduce any features.
>> >
>> > When new components are added to Spark, they may initially be marked
>> > as "alpha". Alpha components do not have to abide by the above
>> > guidelines, however, to the maximum extent possible, they should try
>> > to. Once they are marked "stable" they have to follow these
>> > guidelines. At present, GraphX is the only alpha component of Spark.
>> >
>> > [1] API compatibility:
>> >
>> > An API is any public class or interface exposed in Spark that is not
>> > marked as semi-private or experimental. Release A is API compatible
>> > with release B if code compiled against release A *compiles cleanly*
>> > against B. This does not guarantee that a compiled application that is
>> > linked against version A will link cleanly against version B without
>> > re-compiling. Link-level compatibility is something we'll try to
>> > guarantee that as well, and we might make it a requirement in the
>> > future, but challenges with things like Scala versions have made this
>> > difficult to guarantee in the past.
>> >
>> > == Merging Pull Requests ==
>> > To merge pull requests, committers are encouraged to use this tool [2]
>> > to collapse the request into one comm

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Patrick Wendell
> I like Heiko's proposal that requires every pull request to reference a
> JIRA.  This is how things are done in Hadoop and it makes it much easier
> to, for example, find out whether an issue you came across when googling
> for an error is in a release.

I think this is a good idea and something on which there is wide
consensus. I separately was going to suggest this in a later e-mail
(it's not directly tied to versioning). One of many reasons this is
necessary is because it's becoming hard to track which features ended
up in which releases.

> I agree with Mridul about binary compatibility.  It can be a dealbreaker
> for organizations that are considering an upgrade. The two ways I'm aware
> of that cause binary compatibility are scala version upgrades and messing
> around with inheritance.  Are these not avoidable at least for minor
> releases?

This is clearly a goal but I'm hesitant to codify it until we
understand all of the reasons why it might not work. I've heard in
general with Scala there are many non-obvious things that can break
binary compatibility and we need to understand what they are. I'd
propose we add the migration tool [1] here to our build and use it for
a few months and see what happens (hat tip to Michael Armbrust).

It's easy to formalize this as a requirement later, it's impossible to
go the other direction. For Scala major versions it's possible we can
cross-build between 2.10 and 2.11 to retain link-level compatibility.
It's just entirely uncharted territory and AFAIK no one who's
suggesting this is speaking from experience maintaining this guarantee
for a Scala project.

That would be the strongest convincing reason for me - if someone has
actually done this in the past in a Scala project and speaks from
experience. Most of use are speaking from the perspective of Java
projects where we understand well the trade-off's and costs of
maintaining this guarantee.

[1] https://github.com/typesafehub/migration-manager

- Patrick


Re: Proposal for Spark Release Strategy

2014-02-06 Thread Sandy Ryza
Thanks for all this Patrick.

I like Heiko's proposal that requires every pull request to reference a
JIRA.  This is how things are done in Hadoop and it makes it much easier
to, for example, find out whether an issue you came across when googling
for an error is in a release.

I agree with Mridul about binary compatibility.  It can be a dealbreaker
for organizations that are considering an upgrade. The two ways I'm aware
of that cause binary compatibility are scala version upgrades and messing
around with inheritance.  Are these not avoidable at least for minor
releases?

-Sandy




On Thu, Feb 6, 2014 at 12:49 AM, Mridul Muralidharan wrote:

> The reason I explicitly mentioned about binary compatibility was
> because it was sort of hand waved in the proposal as good to have.
> My understanding is that scala does make it painful to ensure binary
> compatibility - but stability of interfaces is vital to ensure
> dependable platforms.
> Recompilation might be a viable option for developers - not for users.
>
> Regards,
> Mridul
>
>
> On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell 
> wrote:
> > If people feel that merging the intermediate SNAPSHOT number is
> > significant, let's just defer merging that until this discussion
> > concludes.
> >
> > That said - the decision to settle on 1.0 for the next release is not
> > just because it happens to come after 0.9. It's a conscientious
> > decision based on the development of the project to this point. A
> > major focus of the 0.9 release was tying off loose ends in terms of
> > backwards compatibility (e.g. spark configuration). There was some
> > discussion back then of maybe cutting a 1.0 release but the decision
> > was deferred until after 0.9.
> >
> > @mridul - pleas see the original post for discussion about binary
> compatibility.
> >
> > On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski 
> wrote:
> >> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> >> discussion.
> >> On Feb 5, 2014 9:53 PM, "Andrew Ash"  wrote:
> >>
> >>> Agree on timeboxed releases as well.
> >>>
> >>> Is there a vision for where we want to be as a project before
> declaring the
> >>> first 1.0 release?  While we're in the 0.x days per semver we can break
> >>> backcompat at will (though we try to avoid it where possible), and that
> >>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> >>> because it seems to follow after 0.9 rather than making an intentional
> >>> decision that we're at the point where we can stand by the current
> APIs and
> >>> binary compatibility for the next year or so of the major release.
> >>>
> >>> Until that decision is made as a group I'd rather we do an immediate
> >>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it
> later,
> >>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to
> 1.0
> >>> but not the other way around.
> >>>
> >>> https://github.com/apache/incubator-spark/pull/542
> >>>
> >>> Cheers!
> >>> Andrew
> >>>
> >>>
> >>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun  >>> >wrote:
> >>>
> >>> > +1 on time boxed releases and compatibility guidelines
> >>> >
> >>> >
> >>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell  >:
> >>> > >
> >>> > > Hi Everyone,
> >>> > >
> >>> > > In an effort to coordinate development amongst the growing list of
> >>> > > Spark contributors, I've taken some time to write up a proposal to
> >>> > > formalize various pieces of the development process. The next
> release
> >>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
> >>> > > part to coordinate the release plan for 1.0.0 and future releases.
> >>> > > I'll post this on the wiki after discussing it on this thread as
> >>> > > tentative project guidelines.
> >>> > >
> >>> > > == Spark Release Structure ==
> >>> > > Starting with Spark 1.0.0, the Spark project will follow the
> semantic
> >>> > > versioning guidelines (http://semver.org/) with a few deviations.
> >>> > > These small differences account for Spark's nature as a
> multi-module
> >>> > > project.
> >>> > >
> >>> > > Each Spark release will be versioned:
> >>> > > [MAJOR].[MINOR].[MAINTENANCE]
> >>> > >
> >>> > > All releases with the same major version number will have API
> >>> > > compatibility, defined as [1]. Major version numbers will remain
> >>> > > stable over long periods of time. For instance, 1.X.Y may last 1
> year
> >>> > > or more.
> >>> > >
> >>> > > Minor releases will typically contain new features and
> improvements.
> >>> > > The target frequency for minor releases is every 3-4 months. One
> >>> > > change we'd like to make is to announce fixed release dates and
> merge
> >>> > > windows for each release, to facilitate coordination. Each minor
> >>> > > release will have a merge window where new patches can be merged,
> a QA
> >>> > > window when only fixes can be merged, then a final period where
> voting
> >>> > > occurs on release candidates. These windows will be announced
> >

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Mridul Muralidharan
The reason I explicitly mentioned about binary compatibility was
because it was sort of hand waved in the proposal as good to have.
My understanding is that scala does make it painful to ensure binary
compatibility - but stability of interfaces is vital to ensure
dependable platforms.
Recompilation might be a viable option for developers - not for users.

Regards,
Mridul


On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell  wrote:
> If people feel that merging the intermediate SNAPSHOT number is
> significant, let's just defer merging that until this discussion
> concludes.
>
> That said - the decision to settle on 1.0 for the next release is not
> just because it happens to come after 0.9. It's a conscientious
> decision based on the development of the project to this point. A
> major focus of the 0.9 release was tying off loose ends in terms of
> backwards compatibility (e.g. spark configuration). There was some
> discussion back then of maybe cutting a 1.0 release but the decision
> was deferred until after 0.9.
>
> @mridul - pleas see the original post for discussion about binary 
> compatibility.
>
> On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski  
> wrote:
>> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
>> discussion.
>> On Feb 5, 2014 9:53 PM, "Andrew Ash"  wrote:
>>
>>> Agree on timeboxed releases as well.
>>>
>>> Is there a vision for where we want to be as a project before declaring the
>>> first 1.0 release?  While we're in the 0.x days per semver we can break
>>> backcompat at will (though we try to avoid it where possible), and that
>>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>>> because it seems to follow after 0.9 rather than making an intentional
>>> decision that we're at the point where we can stand by the current APIs and
>>> binary compatibility for the next year or so of the major release.
>>>
>>> Until that decision is made as a group I'd rather we do an immediate
>>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>>> but not the other way around.
>>>
>>> https://github.com/apache/incubator-spark/pull/542
>>>
>>> Cheers!
>>> Andrew
>>>
>>>
>>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >> >wrote:
>>>
>>> > +1 on time boxed releases and compatibility guidelines
>>> >
>>> >
>>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
>>> > >
>>> > > Hi Everyone,
>>> > >
>>> > > In an effort to coordinate development amongst the growing list of
>>> > > Spark contributors, I've taken some time to write up a proposal to
>>> > > formalize various pieces of the development process. The next release
>>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>>> > > part to coordinate the release plan for 1.0.0 and future releases.
>>> > > I'll post this on the wiki after discussing it on this thread as
>>> > > tentative project guidelines.
>>> > >
>>> > > == Spark Release Structure ==
>>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>>> > > versioning guidelines (http://semver.org/) with a few deviations.
>>> > > These small differences account for Spark's nature as a multi-module
>>> > > project.
>>> > >
>>> > > Each Spark release will be versioned:
>>> > > [MAJOR].[MINOR].[MAINTENANCE]
>>> > >
>>> > > All releases with the same major version number will have API
>>> > > compatibility, defined as [1]. Major version numbers will remain
>>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>>> > > or more.
>>> > >
>>> > > Minor releases will typically contain new features and improvements.
>>> > > The target frequency for minor releases is every 3-4 months. One
>>> > > change we'd like to make is to announce fixed release dates and merge
>>> > > windows for each release, to facilitate coordination. Each minor
>>> > > release will have a merge window where new patches can be merged, a QA
>>> > > window when only fixes can be merged, then a final period where voting
>>> > > occurs on release candidates. These windows will be announced
>>> > > immediately after the previous minor release to give people plenty of
>>> > > time, and over time, we might make the whole release process more
>>> > > regular (similar to Ubuntu). At the bottom of this document is an
>>> > > example window for the 1.0.0 release.
>>> > >
>>> > > Maintenance releases will occur more frequently and depend on specific
>>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>>> > > these releases are designed to patch bugs. However, higher level
>>> > > libraries may introduce small features, such as a new algorithm,
>>> > > provided they are entirely additive and isolated from existing code
>>> > > paths. Spark core may not introduce any features.
>>> > >
>>> > > When new components are added to Spark, they may initially be marked
>>> > > as "alpha". Alpha components do not have to abide by t

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
If we could minimize the external dependencies, it would certainly be 
beneficial long term. 


> Am 06.02.2014 um 07:37 schrieb Mridul Muralidharan :
> 
> 
> b) minimize external dependencies - some of them would go away/not be
> actively maintained.


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
If people feel that merging the intermediate SNAPSHOT number is
significant, let's just defer merging that until this discussion
concludes.

That said - the decision to settle on 1.0 for the next release is not
just because it happens to come after 0.9. It's a conscientious
decision based on the development of the project to this point. A
major focus of the 0.9 release was tying off loose ends in terms of
backwards compatibility (e.g. spark configuration). There was some
discussion back then of maybe cutting a 1.0 release but the decision
was deferred until after 0.9.

@mridul - pleas see the original post for discussion about binary compatibility.

On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski  wrote:
> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> discussion.
> On Feb 5, 2014 9:53 PM, "Andrew Ash"  wrote:
>
>> Agree on timeboxed releases as well.
>>
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>>
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>>
>> https://github.com/apache/incubator-spark/pull/542
>>
>> Cheers!
>> Andrew
>>
>>
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote:
>>
>> > +1 on time boxed releases and compatibility guidelines
>> >
>> >
>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
>> > >
>> > > Hi Everyone,
>> > >
>> > > In an effort to coordinate development amongst the growing list of
>> > > Spark contributors, I've taken some time to write up a proposal to
>> > > formalize various pieces of the development process. The next release
>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > > part to coordinate the release plan for 1.0.0 and future releases.
>> > > I'll post this on the wiki after discussing it on this thread as
>> > > tentative project guidelines.
>> > >
>> > > == Spark Release Structure ==
>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > > versioning guidelines (http://semver.org/) with a few deviations.
>> > > These small differences account for Spark's nature as a multi-module
>> > > project.
>> > >
>> > > Each Spark release will be versioned:
>> > > [MAJOR].[MINOR].[MAINTENANCE]
>> > >
>> > > All releases with the same major version number will have API
>> > > compatibility, defined as [1]. Major version numbers will remain
>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > > or more.
>> > >
>> > > Minor releases will typically contain new features and improvements.
>> > > The target frequency for minor releases is every 3-4 months. One
>> > > change we'd like to make is to announce fixed release dates and merge
>> > > windows for each release, to facilitate coordination. Each minor
>> > > release will have a merge window where new patches can be merged, a QA
>> > > window when only fixes can be merged, then a final period where voting
>> > > occurs on release candidates. These windows will be announced
>> > > immediately after the previous minor release to give people plenty of
>> > > time, and over time, we might make the whole release process more
>> > > regular (similar to Ubuntu). At the bottom of this document is an
>> > > example window for the 1.0.0 release.
>> > >
>> > > Maintenance releases will occur more frequently and depend on specific
>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>> > > these releases are designed to patch bugs. However, higher level
>> > > libraries may introduce small features, such as a new algorithm,
>> > > provided they are entirely additive and isolated from existing code
>> > > paths. Spark core may not introduce any features.
>> > >
>> > > When new components are added to Spark, they may initially be marked
>> > > as "alpha". Alpha components do not have to abide by the above
>> > > guidelines, however, to the maximum extent possible, they should try
>> > > to. Once they are marked "stable" they have to follow these
>> > > guidelines. At present, GraphX is the only alpha component of Spark.
>> > >
>> > > [1] API compatibility:
>> > >
>> > > An API is any public class or interface exposed in Spark that is not
>> > > marked as semi-private or experimental. Release A is API compatible
>> > > with release B if code compiled against release A *compiles cleanly*
>> > > against B. This does not guarantee tha

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mridul Muralidharan
Before we move to 1.0, we need to address two things :

a) backward compatibility not just at api level, but also at binary
level (not forcing recompile).

b) minimize external dependencies - some of them would go away/not be
actively maintained.


Regards,
Mridul


On Thu, Feb 6, 2014 at 11:50 AM, Andy Konwinski  wrote:
> +1 for 0.10.0 now with the option to switch to 1.0.0 after further
> discussion.
> On Feb 5, 2014 9:53 PM, "Andrew Ash"  wrote:
>
>> Agree on timeboxed releases as well.
>>
>> Is there a vision for where we want to be as a project before declaring the
>> first 1.0 release?  While we're in the 0.x days per semver we can break
>> backcompat at will (though we try to avoid it where possible), and that
>> luxury goes away with 1.x  I just don't want to release a 1.0 simply
>> because it seems to follow after 0.9 rather than making an intentional
>> decision that we're at the point where we can stand by the current APIs and
>> binary compatibility for the next year or so of the major release.
>>
>> Until that decision is made as a group I'd rather we do an immediate
>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
>> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
>> but not the other way around.
>>
>> https://github.com/apache/incubator-spark/pull/542
>>
>> Cheers!
>> Andrew
>>
>>
>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote:
>>
>> > +1 on time boxed releases and compatibility guidelines
>> >
>> >
>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
>> > >
>> > > Hi Everyone,
>> > >
>> > > In an effort to coordinate development amongst the growing list of
>> > > Spark contributors, I've taken some time to write up a proposal to
>> > > formalize various pieces of the development process. The next release
>> > > of Spark will likely be Spark 1.0.0, so this message is intended in
>> > > part to coordinate the release plan for 1.0.0 and future releases.
>> > > I'll post this on the wiki after discussing it on this thread as
>> > > tentative project guidelines.
>> > >
>> > > == Spark Release Structure ==
>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
>> > > versioning guidelines (http://semver.org/) with a few deviations.
>> > > These small differences account for Spark's nature as a multi-module
>> > > project.
>> > >
>> > > Each Spark release will be versioned:
>> > > [MAJOR].[MINOR].[MAINTENANCE]
>> > >
>> > > All releases with the same major version number will have API
>> > > compatibility, defined as [1]. Major version numbers will remain
>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
>> > > or more.
>> > >
>> > > Minor releases will typically contain new features and improvements.
>> > > The target frequency for minor releases is every 3-4 months. One
>> > > change we'd like to make is to announce fixed release dates and merge
>> > > windows for each release, to facilitate coordination. Each minor
>> > > release will have a merge window where new patches can be merged, a QA
>> > > window when only fixes can be merged, then a final period where voting
>> > > occurs on release candidates. These windows will be announced
>> > > immediately after the previous minor release to give people plenty of
>> > > time, and over time, we might make the whole release process more
>> > > regular (similar to Ubuntu). At the bottom of this document is an
>> > > example window for the 1.0.0 release.
>> > >
>> > > Maintenance releases will occur more frequently and depend on specific
>> > > patches introduced (e.g. bug fixes) and their urgency. In general
>> > > these releases are designed to patch bugs. However, higher level
>> > > libraries may introduce small features, such as a new algorithm,
>> > > provided they are entirely additive and isolated from existing code
>> > > paths. Spark core may not introduce any features.
>> > >
>> > > When new components are added to Spark, they may initially be marked
>> > > as "alpha". Alpha components do not have to abide by the above
>> > > guidelines, however, to the maximum extent possible, they should try
>> > > to. Once they are marked "stable" they have to follow these
>> > > guidelines. At present, GraphX is the only alpha component of Spark.
>> > >
>> > > [1] API compatibility:
>> > >
>> > > An API is any public class or interface exposed in Spark that is not
>> > > marked as semi-private or experimental. Release A is API compatible
>> > > with release B if code compiled against release A *compiles cleanly*
>> > > against B. This does not guarantee that a compiled application that is
>> > > linked against version A will link cleanly against version B without
>> > > re-compiling. Link-level compatibility is something we'll try to
>> > > guarantee that as well, and we might make it a requirement in the
>> > > future, but challenges with things like Scala versions have made this
>> > > difficult to guarantee in the past.
>> > >
>> > > == 

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Andy Konwinski
+1 for 0.10.0 now with the option to switch to 1.0.0 after further
discussion.
On Feb 5, 2014 9:53 PM, "Andrew Ash"  wrote:

> Agree on timeboxed releases as well.
>
> Is there a vision for where we want to be as a project before declaring the
> first 1.0 release?  While we're in the 0.x days per semver we can break
> backcompat at will (though we try to avoid it where possible), and that
> luxury goes away with 1.x  I just don't want to release a 1.0 simply
> because it seems to follow after 0.9 rather than making an intentional
> decision that we're at the point where we can stand by the current APIs and
> binary compatibility for the next year or so of the major release.
>
> Until that decision is made as a group I'd rather we do an immediate
> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
> replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
> but not the other way around.
>
> https://github.com/apache/incubator-spark/pull/542
>
> Cheers!
> Andrew
>
>
> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun  >wrote:
>
> > +1 on time boxed releases and compatibility guidelines
> >
> >
> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
> > >
> > > Hi Everyone,
> > >
> > > In an effort to coordinate development amongst the growing list of
> > > Spark contributors, I've taken some time to write up a proposal to
> > > formalize various pieces of the development process. The next release
> > > of Spark will likely be Spark 1.0.0, so this message is intended in
> > > part to coordinate the release plan for 1.0.0 and future releases.
> > > I'll post this on the wiki after discussing it on this thread as
> > > tentative project guidelines.
> > >
> > > == Spark Release Structure ==
> > > Starting with Spark 1.0.0, the Spark project will follow the semantic
> > > versioning guidelines (http://semver.org/) with a few deviations.
> > > These small differences account for Spark's nature as a multi-module
> > > project.
> > >
> > > Each Spark release will be versioned:
> > > [MAJOR].[MINOR].[MAINTENANCE]
> > >
> > > All releases with the same major version number will have API
> > > compatibility, defined as [1]. Major version numbers will remain
> > > stable over long periods of time. For instance, 1.X.Y may last 1 year
> > > or more.
> > >
> > > Minor releases will typically contain new features and improvements.
> > > The target frequency for minor releases is every 3-4 months. One
> > > change we'd like to make is to announce fixed release dates and merge
> > > windows for each release, to facilitate coordination. Each minor
> > > release will have a merge window where new patches can be merged, a QA
> > > window when only fixes can be merged, then a final period where voting
> > > occurs on release candidates. These windows will be announced
> > > immediately after the previous minor release to give people plenty of
> > > time, and over time, we might make the whole release process more
> > > regular (similar to Ubuntu). At the bottom of this document is an
> > > example window for the 1.0.0 release.
> > >
> > > Maintenance releases will occur more frequently and depend on specific
> > > patches introduced (e.g. bug fixes) and their urgency. In general
> > > these releases are designed to patch bugs. However, higher level
> > > libraries may introduce small features, such as a new algorithm,
> > > provided they are entirely additive and isolated from existing code
> > > paths. Spark core may not introduce any features.
> > >
> > > When new components are added to Spark, they may initially be marked
> > > as "alpha". Alpha components do not have to abide by the above
> > > guidelines, however, to the maximum extent possible, they should try
> > > to. Once they are marked "stable" they have to follow these
> > > guidelines. At present, GraphX is the only alpha component of Spark.
> > >
> > > [1] API compatibility:
> > >
> > > An API is any public class or interface exposed in Spark that is not
> > > marked as semi-private or experimental. Release A is API compatible
> > > with release B if code compiled against release A *compiles cleanly*
> > > against B. This does not guarantee that a compiled application that is
> > > linked against version A will link cleanly against version B without
> > > re-compiling. Link-level compatibility is something we'll try to
> > > guarantee that as well, and we might make it a requirement in the
> > > future, but challenges with things like Scala versions have made this
> > > difficult to guarantee in the past.
> > >
> > > == Merging Pull Requests ==
> > > To merge pull requests, committers are encouraged to use this tool [2]
> > > to collapse the request into one commit rather than manually
> > > performing git merges. It will also format the commit message nicely
> > > in a way that can be easily parsed later when writing credits.
> > > Currently it is maintained in a public utility repository, but we'll
> > > merge it into mainline Spar

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Andrew Ash
Agree on timeboxed releases as well.

Is there a vision for where we want to be as a project before declaring the
first 1.0 release?  While we're in the 0.x days per semver we can break
backcompat at will (though we try to avoid it where possible), and that
luxury goes away with 1.x  I just don't want to release a 1.0 simply
because it seems to follow after 0.9 rather than making an intentional
decision that we're at the point where we can stand by the current APIs and
binary compatibility for the next year or so of the major release.

Until that decision is made as a group I'd rather we do an immediate
version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later,
replace that with 1.0.0-SNAPSHOT.  It's very easy to go from 0.10 to 1.0
but not the other way around.

https://github.com/apache/incubator-spark/pull/542

Cheers!
Andrew


On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote:

> +1 on time boxed releases and compatibility guidelines
>
>
> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
> >
> > Hi Everyone,
> >
> > In an effort to coordinate development amongst the growing list of
> > Spark contributors, I've taken some time to write up a proposal to
> > formalize various pieces of the development process. The next release
> > of Spark will likely be Spark 1.0.0, so this message is intended in
> > part to coordinate the release plan for 1.0.0 and future releases.
> > I'll post this on the wiki after discussing it on this thread as
> > tentative project guidelines.
> >
> > == Spark Release Structure ==
> > Starting with Spark 1.0.0, the Spark project will follow the semantic
> > versioning guidelines (http://semver.org/) with a few deviations.
> > These small differences account for Spark's nature as a multi-module
> > project.
> >
> > Each Spark release will be versioned:
> > [MAJOR].[MINOR].[MAINTENANCE]
> >
> > All releases with the same major version number will have API
> > compatibility, defined as [1]. Major version numbers will remain
> > stable over long periods of time. For instance, 1.X.Y may last 1 year
> > or more.
> >
> > Minor releases will typically contain new features and improvements.
> > The target frequency for minor releases is every 3-4 months. One
> > change we'd like to make is to announce fixed release dates and merge
> > windows for each release, to facilitate coordination. Each minor
> > release will have a merge window where new patches can be merged, a QA
> > window when only fixes can be merged, then a final period where voting
> > occurs on release candidates. These windows will be announced
> > immediately after the previous minor release to give people plenty of
> > time, and over time, we might make the whole release process more
> > regular (similar to Ubuntu). At the bottom of this document is an
> > example window for the 1.0.0 release.
> >
> > Maintenance releases will occur more frequently and depend on specific
> > patches introduced (e.g. bug fixes) and their urgency. In general
> > these releases are designed to patch bugs. However, higher level
> > libraries may introduce small features, such as a new algorithm,
> > provided they are entirely additive and isolated from existing code
> > paths. Spark core may not introduce any features.
> >
> > When new components are added to Spark, they may initially be marked
> > as "alpha". Alpha components do not have to abide by the above
> > guidelines, however, to the maximum extent possible, they should try
> > to. Once they are marked "stable" they have to follow these
> > guidelines. At present, GraphX is the only alpha component of Spark.
> >
> > [1] API compatibility:
> >
> > An API is any public class or interface exposed in Spark that is not
> > marked as semi-private or experimental. Release A is API compatible
> > with release B if code compiled against release A *compiles cleanly*
> > against B. This does not guarantee that a compiled application that is
> > linked against version A will link cleanly against version B without
> > re-compiling. Link-level compatibility is something we'll try to
> > guarantee that as well, and we might make it a requirement in the
> > future, but challenges with things like Scala versions have made this
> > difficult to guarantee in the past.
> >
> > == Merging Pull Requests ==
> > To merge pull requests, committers are encouraged to use this tool [2]
> > to collapse the request into one commit rather than manually
> > performing git merges. It will also format the commit message nicely
> > in a way that can be easily parsed later when writing credits.
> > Currently it is maintained in a public utility repository, but we'll
> > merge it into mainline Spark soon.
> >
> > [2]
> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> >
> > == Tentative Release Window for 1.0.0 ==
> > Feb 1st - April 1st: General development
> > April 1st: Code freeze for new features
> > April 15th: RC1
> >
> > == Deviations ==
> > For now, the proposal is to cons

Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
+1 on time boxed releases and compatibility guidelines


> Am 06.02.2014 um 01:20 schrieb Patrick Wendell :
> 
> Hi Everyone,
> 
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
> 
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
> 
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
> 
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
> 
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
> 
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and their urgency. In general
> these releases are designed to patch bugs. However, higher level
> libraries may introduce small features, such as a new algorithm,
> provided they are entirely additive and isolated from existing code
> paths. Spark core may not introduce any features.
> 
> When new components are added to Spark, they may initially be marked
> as "alpha". Alpha components do not have to abide by the above
> guidelines, however, to the maximum extent possible, they should try
> to. Once they are marked "stable" they have to follow these
> guidelines. At present, GraphX is the only alpha component of Spark.
> 
> [1] API compatibility:
> 
> An API is any public class or interface exposed in Spark that is not
> marked as semi-private or experimental. Release A is API compatible
> with release B if code compiled against release A *compiles cleanly*
> against B. This does not guarantee that a compiled application that is
> linked against version A will link cleanly against version B without
> re-compiling. Link-level compatibility is something we'll try to
> guarantee that as well, and we might make it a requirement in the
> future, but challenges with things like Scala versions have made this
> difficult to guarantee in the past.
> 
> == Merging Pull Requests ==
> To merge pull requests, committers are encouraged to use this tool [2]
> to collapse the request into one commit rather than manually
> performing git merges. It will also format the commit message nicely
> in a way that can be easily parsed later when writing credits.
> Currently it is maintained in a public utility repository, but we'll
> merge it into mainline Spark soon.
> 
> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
> 
> == Tentative Release Window for 1.0.0 ==
> Feb 1st - April 1st: General development
> April 1st: Code freeze for new features
> April 15th: RC1
> 
> == Deviations ==
> For now, the proposal is to consider these tentative guidelines. We
> can vote to formalize these as project rules at a later time after
> some experience working with them. Once formalized, any deviation to
> these guidelines will be subject to a lazy majority vote.
> 
> - Patrick


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Heiko Braun
I would even take it further, when it comes to PR's:

- any pr needs to reference a jira
- the pr should be rebased before submitting, to avoid merge commits
- as patrick said: require squashed commits

/heiko




> Am 06.02.2014 um 01:39 schrieb Mark Hamstra :
> 
> I would strongly encourage that developers submitting pull requests include
> within the description of that PR whether you intend the contribution to be
> mergeable at the maintenance level, minor level, or major level.  


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Yup, the intended merge level is just a hint, the responsibility still lies
with the committers.  It can be a helpful hint, though.


On Wed, Feb 5, 2014 at 4:55 PM, Patrick Wendell  wrote:

> > How are Alpha components and higher level libraries which may add small
> > features within a maintenance release going to be marked with that
> status?
> >  Somehow/somewhere within the code itself, as just as some kind of
> external
> > reference?
>
> I think we'd mark alpha features as such in the java/scaladoc. This is
> what scala does with experimental features. Higher level libraries are
> anything that isn't Spark core. Maybe we can formalize this more
> somehow.
>
> We might be able to annotate the new features as experimental if they
> end up in a patch release. This could make it more clear.
>
> >
> > I would strongly encourage that developers submitting pull requests
> include
> > within the description of that PR whether you intend the contribution to
> be
> > mergeable at the maintenance level, minor level, or major level.  That
> will
> > help those of us doing code reviews and merges decide where the code
> should
> > go and how closely to scrutinize the PR for changes that are not
> compatible
> > with the intended release level.
>
> I'd say the default is the minor level. If contributors know it should
> be added in a maintenance release, it's great if they say so. However
> I'd say this is also responsibility with the committers, since
> individual contributors may not know. It will probably be a while
> before major level patches are being merged :P
>


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
> How are Alpha components and higher level libraries which may add small
> features within a maintenance release going to be marked with that status?
>  Somehow/somewhere within the code itself, as just as some kind of external
> reference?

I think we'd mark alpha features as such in the java/scaladoc. This is
what scala does with experimental features. Higher level libraries are
anything that isn't Spark core. Maybe we can formalize this more
somehow.

We might be able to annotate the new features as experimental if they
end up in a patch release. This could make it more clear.

>
> I would strongly encourage that developers submitting pull requests include
> within the description of that PR whether you intend the contribution to be
> mergeable at the maintenance level, minor level, or major level.  That will
> help those of us doing code reviews and merges decide where the code should
> go and how closely to scrutinize the PR for changes that are not compatible
> with the intended release level.

I'd say the default is the minor level. If contributors know it should
be added in a maintenance release, it's great if they say so. However
I'd say this is also responsibility with the committers, since
individual contributors may not know. It will probably be a while
before major level patches are being merged :P


Re: Proposal for Spark Release Strategy

2014-02-05 Thread Mark Hamstra
Looks good.

One question and one comment:

How are Alpha components and higher level libraries which may add small
features within a maintenance release going to be marked with that status?
 Somehow/somewhere within the code itself, as just as some kind of external
reference?

I would strongly encourage that developers submitting pull requests include
within the description of that PR whether you intend the contribution to be
mergeable at the maintenance level, minor level, or major level.  That will
help those of us doing code reviews and merges decide where the code should
go and how closely to scrutinize the PR for changes that are not compatible
with the intended release level.


On Wed, Feb 5, 2014 at 4:20 PM, Patrick Wendell  wrote:

> Hi Everyone,
>
> In an effort to coordinate development amongst the growing list of
> Spark contributors, I've taken some time to write up a proposal to
> formalize various pieces of the development process. The next release
> of Spark will likely be Spark 1.0.0, so this message is intended in
> part to coordinate the release plan for 1.0.0 and future releases.
> I'll post this on the wiki after discussing it on this thread as
> tentative project guidelines.
>
> == Spark Release Structure ==
> Starting with Spark 1.0.0, the Spark project will follow the semantic
> versioning guidelines (http://semver.org/) with a few deviations.
> These small differences account for Spark's nature as a multi-module
> project.
>
> Each Spark release will be versioned:
> [MAJOR].[MINOR].[MAINTENANCE]
>
> All releases with the same major version number will have API
> compatibility, defined as [1]. Major version numbers will remain
> stable over long periods of time. For instance, 1.X.Y may last 1 year
> or more.
>
> Minor releases will typically contain new features and improvements.
> The target frequency for minor releases is every 3-4 months. One
> change we'd like to make is to announce fixed release dates and merge
> windows for each release, to facilitate coordination. Each minor
> release will have a merge window where new patches can be merged, a QA
> window when only fixes can be merged, then a final period where voting
> occurs on release candidates. These windows will be announced
> immediately after the previous minor release to give people plenty of
> time, and over time, we might make the whole release process more
> regular (similar to Ubuntu). At the bottom of this document is an
> example window for the 1.0.0 release.
>
> Maintenance releases will occur more frequently and depend on specific
> patches introduced (e.g. bug fixes) and their urgency. In general
> these releases are designed to patch bugs. However, higher level
> libraries may introduce small features, such as a new algorithm,
> provided they are entirely additive and isolated from existing code
> paths. Spark core may not introduce any features.
>
> When new components are added to Spark, they may initially be marked
> as "alpha". Alpha components do not have to abide by the above
> guidelines, however, to the maximum extent possible, they should try
> to. Once they are marked "stable" they have to follow these
> guidelines. At present, GraphX is the only alpha component of Spark.
>
> [1] API compatibility:
>
> An API is any public class or interface exposed in Spark that is not
> marked as semi-private or experimental. Release A is API compatible
> with release B if code compiled against release A *compiles cleanly*
> against B. This does not guarantee that a compiled application that is
> linked against version A will link cleanly against version B without
> re-compiling. Link-level compatibility is something we'll try to
> guarantee that as well, and we might make it a requirement in the
> future, but challenges with things like Scala versions have made this
> difficult to guarantee in the past.
>
> == Merging Pull Requests ==
> To merge pull requests, committers are encouraged to use this tool [2]
> to collapse the request into one commit rather than manually
> performing git merges. It will also format the commit message nicely
> in a way that can be easily parsed later when writing credits.
> Currently it is maintained in a public utility repository, but we'll
> merge it into mainline Spark soon.
>
> [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>
> == Tentative Release Window for 1.0.0 ==
> Feb 1st - April 1st: General development
> April 1st: Code freeze for new features
> April 15th: RC1
>
> == Deviations ==
> For now, the proposal is to consider these tentative guidelines. We
> can vote to formalize these as project rules at a later time after
> some experience working with them. Once formalized, any deviation to
> these guidelines will be subject to a lazy majority vote.
>
> - Patrick
>


Proposal for Spark Release Strategy

2014-02-05 Thread Patrick Wendell
Hi Everyone,

In an effort to coordinate development amongst the growing list of
Spark contributors, I've taken some time to write up a proposal to
formalize various pieces of the development process. The next release
of Spark will likely be Spark 1.0.0, so this message is intended in
part to coordinate the release plan for 1.0.0 and future releases.
I'll post this on the wiki after discussing it on this thread as
tentative project guidelines.

== Spark Release Structure ==
Starting with Spark 1.0.0, the Spark project will follow the semantic
versioning guidelines (http://semver.org/) with a few deviations.
These small differences account for Spark's nature as a multi-module
project.

Each Spark release will be versioned:
[MAJOR].[MINOR].[MAINTENANCE]

All releases with the same major version number will have API
compatibility, defined as [1]. Major version numbers will remain
stable over long periods of time. For instance, 1.X.Y may last 1 year
or more.

Minor releases will typically contain new features and improvements.
The target frequency for minor releases is every 3-4 months. One
change we'd like to make is to announce fixed release dates and merge
windows for each release, to facilitate coordination. Each minor
release will have a merge window where new patches can be merged, a QA
window when only fixes can be merged, then a final period where voting
occurs on release candidates. These windows will be announced
immediately after the previous minor release to give people plenty of
time, and over time, we might make the whole release process more
regular (similar to Ubuntu). At the bottom of this document is an
example window for the 1.0.0 release.

Maintenance releases will occur more frequently and depend on specific
patches introduced (e.g. bug fixes) and their urgency. In general
these releases are designed to patch bugs. However, higher level
libraries may introduce small features, such as a new algorithm,
provided they are entirely additive and isolated from existing code
paths. Spark core may not introduce any features.

When new components are added to Spark, they may initially be marked
as "alpha". Alpha components do not have to abide by the above
guidelines, however, to the maximum extent possible, they should try
to. Once they are marked "stable" they have to follow these
guidelines. At present, GraphX is the only alpha component of Spark.

[1] API compatibility:

An API is any public class or interface exposed in Spark that is not
marked as semi-private or experimental. Release A is API compatible
with release B if code compiled against release A *compiles cleanly*
against B. This does not guarantee that a compiled application that is
linked against version A will link cleanly against version B without
re-compiling. Link-level compatibility is something we'll try to
guarantee that as well, and we might make it a requirement in the
future, but challenges with things like Scala versions have made this
difficult to guarantee in the past.

== Merging Pull Requests ==
To merge pull requests, committers are encouraged to use this tool [2]
to collapse the request into one commit rather than manually
performing git merges. It will also format the commit message nicely
in a way that can be easily parsed later when writing credits.
Currently it is maintained in a public utility repository, but we'll
merge it into mainline Spark soon.

[2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py

== Tentative Release Window for 1.0.0 ==
Feb 1st - April 1st: General development
April 1st: Code freeze for new features
April 15th: RC1

== Deviations ==
For now, the proposal is to consider these tentative guidelines. We
can vote to formalize these as project rules at a later time after
some experience working with them. Once formalized, any deviation to
these guidelines will be subject to a lazy majority vote.

- Patrick