Re: [SUMMARY] Proposal for Spark Release Strategy
Ok, JIRA ticket filed [1] for this one. - Henry [1] https://spark-project.atlassian.net/browse/SPARK-1070 On Sat, Feb 8, 2014 at 3:39 PM, Patrick Wendell wrote: > :P - I'm pretty sure this can be done but it will require some work - > we already use the github API in our merge script and we could hook > something like that up with the jenkins tests. Henry maybe you could > create a JIRA for this for Spark 1.0? > > - Patrick > > On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra wrote: >> I know that it can be done -- which is different from saying that I know how >> to set it up. >> >> >>> On Feb 8, 2014, at 2:57 PM, Henry Saputra wrote: >>> >>> Patrick, do you know if there is a way to check if a Github PR's >>> subject/ title contains JIRA number and will raise warning by the >>> Jenkins? >>> >>> - Henry >>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell wrote: Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. --> Compromise: let's do one minor release 1.0->1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. --> Version 1.X indicates API stability rather than a feature set; this was clarified. --> That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
:) Sure thing. I will create JIRA ticket for this. Thx guys, Henry On Saturday, February 8, 2014, Patrick Wendell wrote: > :P - I'm pretty sure this can be done but it will require some work - > we already use the github API in our merge script and we could hook > something like that up with the jenkins tests. Henry maybe you could > create a JIRA for this for Spark 1.0? > > - Patrick > > On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra > > > wrote: > > I know that it can be done -- which is different from saying that I know > how to set it up. > > > > > >> On Feb 8, 2014, at 2:57 PM, Henry Saputra > >> > > wrote: > >> > >> Patrick, do you know if there is a way to check if a Github PR's > >> subject/ title contains JIRA number and will raise warning by the > >> Jenkins? > >> > >> - Henry > >> > >>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell > >>> > > wrote: > >>> Hey All, > >>> > >>> Thanks for everyone who participated in this thread. I've distilled > >>> feedback based on the discussion and wanted to summarize the > >>> conclusions: > >>> > >>> - People seem universally +1 on semantic versioning in general. > >>> > >>> - People seem universally +1 on having a public merge windows for > releases. > >>> > >>> - People seem universally +1 on a policy of having associated JIRA's > >>> with features. > >>> > >>> - Everyone believes link-level compatiblity should be the goal. Some > >>> people think we should outright promise it now. Others thing we should > >>> either not promise it or promise it later. > >>> --> Compromise: let's do one minor release 1.0->1.1 to convince > >>> ourselves this is possible (some issues with Scala traits will make > >>> this tricky). Then we can codify it in writing. I've created > >>> SPARK-1069 [1] to clearly establish that this is the goal for 1.X > >>> family of releases. > >>> > >>> - Some people think we should add particular features before having > 1.0. > >>> --> Version 1.X indicates API stability rather than a feature set; > >>> this was clarified. > >>> --> That said, people still have several months to work on features if > >>> they really want to get them in for this release. > >>> > >>> I'm going to integrate this feedback and post a tentative version of > >>> the release guidelines to the wiki. > >>> > >>> With all this said, I would like to move the master version to > >>> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and > >>> clarified. This merely represents a tentative consensus and the > >>> release is still subject to a formal vote amongst PMC members. > >>> > >>> [1] https://spark-project.atlassian.net/browse/SPARK-1069 > >>> > >>> - Patrick >
Re: [SUMMARY] Proposal for Spark Release Strategy
:P - I'm pretty sure this can be done but it will require some work - we already use the github API in our merge script and we could hook something like that up with the jenkins tests. Henry maybe you could create a JIRA for this for Spark 1.0? - Patrick On Sat, Feb 8, 2014 at 3:20 PM, Mark Hamstra wrote: > I know that it can be done -- which is different from saying that I know how > to set it up. > > >> On Feb 8, 2014, at 2:57 PM, Henry Saputra wrote: >> >> Patrick, do you know if there is a way to check if a Github PR's >> subject/ title contains JIRA number and will raise warning by the >> Jenkins? >> >> - Henry >> >>> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell wrote: >>> Hey All, >>> >>> Thanks for everyone who participated in this thread. I've distilled >>> feedback based on the discussion and wanted to summarize the >>> conclusions: >>> >>> - People seem universally +1 on semantic versioning in general. >>> >>> - People seem universally +1 on having a public merge windows for releases. >>> >>> - People seem universally +1 on a policy of having associated JIRA's >>> with features. >>> >>> - Everyone believes link-level compatiblity should be the goal. Some >>> people think we should outright promise it now. Others thing we should >>> either not promise it or promise it later. >>> --> Compromise: let's do one minor release 1.0->1.1 to convince >>> ourselves this is possible (some issues with Scala traits will make >>> this tricky). Then we can codify it in writing. I've created >>> SPARK-1069 [1] to clearly establish that this is the goal for 1.X >>> family of releases. >>> >>> - Some people think we should add particular features before having 1.0. >>> --> Version 1.X indicates API stability rather than a feature set; >>> this was clarified. >>> --> That said, people still have several months to work on features if >>> they really want to get them in for this release. >>> >>> I'm going to integrate this feedback and post a tentative version of >>> the release guidelines to the wiki. >>> >>> With all this said, I would like to move the master version to >>> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and >>> clarified. This merely represents a tentative consensus and the >>> release is still subject to a formal vote amongst PMC members. >>> >>> [1] https://spark-project.atlassian.net/browse/SPARK-1069 >>> >>> - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
I know that it can be done -- which is different from saying that I know how to set it up. > On Feb 8, 2014, at 2:57 PM, Henry Saputra wrote: > > Patrick, do you know if there is a way to check if a Github PR's > subject/ title contains JIRA number and will raise warning by the > Jenkins? > > - Henry > >> On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell wrote: >> Hey All, >> >> Thanks for everyone who participated in this thread. I've distilled >> feedback based on the discussion and wanted to summarize the >> conclusions: >> >> - People seem universally +1 on semantic versioning in general. >> >> - People seem universally +1 on having a public merge windows for releases. >> >> - People seem universally +1 on a policy of having associated JIRA's >> with features. >> >> - Everyone believes link-level compatiblity should be the goal. Some >> people think we should outright promise it now. Others thing we should >> either not promise it or promise it later. >> --> Compromise: let's do one minor release 1.0->1.1 to convince >> ourselves this is possible (some issues with Scala traits will make >> this tricky). Then we can codify it in writing. I've created >> SPARK-1069 [1] to clearly establish that this is the goal for 1.X >> family of releases. >> >> - Some people think we should add particular features before having 1.0. >> --> Version 1.X indicates API stability rather than a feature set; >> this was clarified. >> --> That said, people still have several months to work on features if >> they really want to get them in for this release. >> >> I'm going to integrate this feedback and post a tentative version of >> the release guidelines to the wiki. >> >> With all this said, I would like to move the master version to >> 1.0.0-SNAPSHOT as the main concerns with this have been addressed and >> clarified. This merely represents a tentative consensus and the >> release is still subject to a formal vote amongst PMC members. >> >> [1] https://spark-project.atlassian.net/browse/SPARK-1069 >> >> - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
Patrick, do you know if there is a way to check if a Github PR's subject/ title contains JIRA number and will raise warning by the Jenkins? - Henry On Sat, Feb 8, 2014 at 12:56 PM, Patrick Wendell wrote: > Hey All, > > Thanks for everyone who participated in this thread. I've distilled > feedback based on the discussion and wanted to summarize the > conclusions: > > - People seem universally +1 on semantic versioning in general. > > - People seem universally +1 on having a public merge windows for releases. > > - People seem universally +1 on a policy of having associated JIRA's > with features. > > - Everyone believes link-level compatiblity should be the goal. Some > people think we should outright promise it now. Others thing we should > either not promise it or promise it later. > --> Compromise: let's do one minor release 1.0->1.1 to convince > ourselves this is possible (some issues with Scala traits will make > this tricky). Then we can codify it in writing. I've created > SPARK-1069 [1] to clearly establish that this is the goal for 1.X > family of releases. > > - Some people think we should add particular features before having 1.0. > --> Version 1.X indicates API stability rather than a feature set; > this was clarified. > --> That said, people still have several months to work on features if > they really want to get them in for this release. > > I'm going to integrate this feedback and post a tentative version of > the release guidelines to the wiki. > > With all this said, I would like to move the master version to > 1.0.0-SNAPSHOT as the main concerns with this have been addressed and > clarified. This merely represents a tentative consensus and the > release is still subject to a formal vote amongst PMC members. > > [1] https://spark-project.atlassian.net/browse/SPARK-1069 > > - Patrick
Re: [SUMMARY] Proposal for Spark Release Strategy
Thanks for the summary Patrick. I'm glad that we discussed the options before pulling the trigger on a version number update (my -1 had only been about committing a major version update without thorough discussion). IMO that's been addressed and given the discussion, I'm changing to a +1 for 1.0.0 On Feb 8, 2014 12:56 PM, "Patrick Wendell" wrote: > Hey All, > > Thanks for everyone who participated in this thread. I've distilled > feedback based on the discussion and wanted to summarize the > conclusions: > > - People seem universally +1 on semantic versioning in general. > > - People seem universally +1 on having a public merge windows for releases. > > - People seem universally +1 on a policy of having associated JIRA's > with features. > > - Everyone believes link-level compatiblity should be the goal. Some > people think we should outright promise it now. Others thing we should > either not promise it or promise it later. > --> Compromise: let's do one minor release 1.0->1.1 to convince > ourselves this is possible (some issues with Scala traits will make > this tricky). Then we can codify it in writing. I've created > SPARK-1069 [1] to clearly establish that this is the goal for 1.X > family of releases. > > - Some people think we should add particular features before having 1.0. > --> Version 1.X indicates API stability rather than a feature set; > this was clarified. > --> That said, people still have several months to work on features if > they really want to get them in for this release. > > I'm going to integrate this feedback and post a tentative version of > the release guidelines to the wiki. > > With all this said, I would like to move the master version to > 1.0.0-SNAPSHOT as the main concerns with this have been addressed and > clarified. This merely represents a tentative consensus and the > release is still subject to a formal vote amongst PMC members. > > [1] https://spark-project.atlassian.net/browse/SPARK-1069 > > - Patrick >
[SUMMARY] Proposal for Spark Release Strategy
Hey All, Thanks for everyone who participated in this thread. I've distilled feedback based on the discussion and wanted to summarize the conclusions: - People seem universally +1 on semantic versioning in general. - People seem universally +1 on having a public merge windows for releases. - People seem universally +1 on a policy of having associated JIRA's with features. - Everyone believes link-level compatiblity should be the goal. Some people think we should outright promise it now. Others thing we should either not promise it or promise it later. --> Compromise: let's do one minor release 1.0->1.1 to convince ourselves this is possible (some issues with Scala traits will make this tricky). Then we can codify it in writing. I've created SPARK-1069 [1] to clearly establish that this is the goal for 1.X family of releases. - Some people think we should add particular features before having 1.0. --> Version 1.X indicates API stability rather than a feature set; this was clarified. --> That said, people still have several months to work on features if they really want to get them in for this release. I'm going to integrate this feedback and post a tentative version of the release guidelines to the wiki. With all this said, I would like to move the master version to 1.0.0-SNAPSHOT as the main concerns with this have been addressed and clarified. This merely represents a tentative consensus and the release is still subject to a formal vote amongst PMC members. [1] https://spark-project.atlassian.net/browse/SPARK-1069 - Patrick
Re: Proposal for Spark Release Strategy
Will, Thanks for these thoughts - this is something we should try to be attentive to in the way we think about versioning. (2)-(5) are pretty consistent with the guidelines we already follow. I think the biggest proposed difference is to be conscious of (1), which at least I had not given much thought to in the past. Specifically, if we make major version upgrades of dependencies within a major release of Spark, it can cause issues for downstream packagers. I can't easily recall how often we do this or whether this will be hard for us to guarantee (maybe others can...). It's something to keep in mind though - thanks for bringing it up. - Patrick On Fri, Feb 7, 2014 at 10:28 AM, Will Benton wrote: > Semantic versioning is great, and I think the proposed extensions for > adopting it in Spark make a lot of sense. However, by focusing strictly on > public APIs, semantic versioning only solves part of the problem (albeit > certainly the most interesting part). I'd like to raise another issue that > the semantic versioning guidelines explicitly exclude: the relative stability > of dependencies and dependency versions. This is less of a concern for > end-users than it is for downstream packagers, but I believe that the > relative stability of a dependency stack *should* be part of what is implied > by a major version number. > > Here are some suggestions for how to incorporate dependency stack versioning > into semantic versioning in order to make life easier for downstreams; please > consider all of these to be prefaced with "If at all possible,": > > 1. Switching a dependency to an incompatible version should be reserved for > major releases. In general, downstream operating system distributions > support only one version of each library, although in rare cases alternate > versions are available for backwards compatibility. If a bug fix or feature > addition in a patch or minor release depends on adopting a version of some > library that is incompatible with the one used by the prior patch or minor > release, then downstreams may not be able to incorporate the fix or > functionality until every package impacted by the dependency can be updated > to work with the new version. > > 2. New dependencies should only be introduced with new features (and thus > with new minor versions). This suggestion is probably uncontroversial, since > features are more likely than bugfixes to require additional external > libraries. > > 3. The scope of new dependencies should be proportional to the benefit that > they provide. Of course, we want to avoid reinventing the wheel, but if the > alternative is pulling in a framework for WheelFactory generation, a > WheelContainer library, and a dozen transitive dependencies, maybe it's worth > considering reinventing at least the simplest and least general wheels. > > 4. If new functionality requires additional dependencies, it should be > developed to work with the most recent stable version of those libraries that > is generally available. Again, since downstreams typically support only one > version per library at a time, this will make their job easier. (This will > benefit everyone, though, since the most recent version of some dependency is > more likely to see active maintenance efforts.) > > 5. Dependencies can be removed at any time. > > I hope these can be a starting point for further discussion and adoption of > practices that demarcate the scope of dependency changes in a given version > stream. > > > > best, > wb > > > - Original Message - >> From: "Patrick Wendell" >> To: dev@spark.incubator.apache.org >> Sent: Wednesday, February 5, 2014 6:20:10 PM >> Subject: Proposal for Spark Release Strategy >> >> Hi Everyone, >> >> In an effort to coordinate development amongst the growing list of >> Spark contributors, I've taken some time to write up a proposal to >> formalize various pieces of the development process. The next release >> of Spark will likely be Spark 1.0.0, so this message is intended in >> part to coordinate the release plan for 1.0.0 and future releases. >> I'll post this on the wiki after discussing it on this thread as >> tentative project guidelines. >> >> == Spark Release Structure == >> Starting with Spark 1.0.0, the Spark project will follow the semantic >> versioning guidelines (http://semver.org/) with a few deviations. >> These small differences account for Spark's nature as a multi-module >> project. >> >> Each Spark release will be versioned: >> [MAJOR].[MINOR].[MAINTENANCE] >> >> All releases with the same major version num
Re: Proposal for Spark Release Strategy
Semantic versioning is great, and I think the proposed extensions for adopting it in Spark make a lot of sense. However, by focusing strictly on public APIs, semantic versioning only solves part of the problem (albeit certainly the most interesting part). I'd like to raise another issue that the semantic versioning guidelines explicitly exclude: the relative stability of dependencies and dependency versions. This is less of a concern for end-users than it is for downstream packagers, but I believe that the relative stability of a dependency stack *should* be part of what is implied by a major version number. Here are some suggestions for how to incorporate dependency stack versioning into semantic versioning in order to make life easier for downstreams; please consider all of these to be prefaced with "If at all possible,": 1. Switching a dependency to an incompatible version should be reserved for major releases. In general, downstream operating system distributions support only one version of each library, although in rare cases alternate versions are available for backwards compatibility. If a bug fix or feature addition in a patch or minor release depends on adopting a version of some library that is incompatible with the one used by the prior patch or minor release, then downstreams may not be able to incorporate the fix or functionality until every package impacted by the dependency can be updated to work with the new version. 2. New dependencies should only be introduced with new features (and thus with new minor versions). This suggestion is probably uncontroversial, since features are more likely than bugfixes to require additional external libraries. 3. The scope of new dependencies should be proportional to the benefit that they provide. Of course, we want to avoid reinventing the wheel, but if the alternative is pulling in a framework for WheelFactory generation, a WheelContainer library, and a dozen transitive dependencies, maybe it's worth considering reinventing at least the simplest and least general wheels. 4. If new functionality requires additional dependencies, it should be developed to work with the most recent stable version of those libraries that is generally available. Again, since downstreams typically support only one version per library at a time, this will make their job easier. (This will benefit everyone, though, since the most recent version of some dependency is more likely to see active maintenance efforts.) 5. Dependencies can be removed at any time. I hope these can be a starting point for further discussion and adoption of practices that demarcate the scope of dependency changes in a given version stream. best, wb - Original Message - > From: "Patrick Wendell" > To: dev@spark.incubator.apache.org > Sent: Wednesday, February 5, 2014 6:20:10 PM > Subject: Proposal for Spark Release Strategy > > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of the development process. The next release > of Spark will likely be Spark 1.0.0, so this message is intended in > part to coordinate the release plan for 1.0.0 and future releases. > I'll post this on the wiki after discussing it on this thread as > tentative project guidelines. > > == Spark Release Structure == > Starting with Spark 1.0.0, the Spark project will follow the semantic > versioning guidelines (http://semver.org/) with a few deviations. > These small differences account for Spark's nature as a multi-module > project. > > Each Spark release will be versioned: > [MAJOR].[MINOR].[MAINTENANCE] > > All releases with the same major version number will have API > compatibility, defined as [1]. Major version numbers will remain > stable over long periods of time. For instance, 1.X.Y may last 1 year > or more. > > Minor releases will typically contain new features and improvements. > The target frequency for minor releases is every 3-4 months. One > change we'd like to make is to announce fixed release dates and merge > windows for each release, to facilitate coordination. Each minor > release will have a merge window where new patches can be merged, a QA > window when only fixes can be merged, then a final period where voting > occurs on release candidates. These windows will be announced > immediately after the previous minor release to give people plenty of > time, and over time, we might make the whole release process more > regular (similar to Ubuntu). At the bottom of this document is an > example window for the 1.0.0 release. > > Maintenance releases will occur more frequently and depend on specific > patches introduced (e.g. bug fixes) and
Re: Proposal for Spark Release Strategy
I'm not sure that that is the conclusion that I would draw from the Hadoop example. I would certainly agree that maintaining and supporting both an old and a new API is a cause of endless confusion for users. If we are going to change or drop things from the API to reach 1.0, then we shouldn't be maintaining and support the prior way of doing things beyond a 1.0.0 -> 1.1.0 deprecation cycle. On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza wrote: > If the APIs are usable, stability and continuity are much more important > than perfection. With many already relying on the current APIs, I think > trying to clean them up will just cause pain for users and integrators. > Hadoop made this mistake when they decided the original MapReduce APIs > were ugly and introduced a new set of APIs to do the same thing. Even > though this happened in a pre-1.0 release, three years down the road, both > the old and new APIs are still supported, causing endless confusion for > users. If individual functions or configuration properties have unclear > names, they can be deprecated and replaced, but redoing the APIs or > breaking compatibility at this point is simply not worth it. > > > On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid > wrote: > > > I don't really agree with this logic. I think we haven't broken API so > far > > because we just keep adding stuff on to it, and we haven't bothered to > > clean the api up, specifically to *avoid* breaking things. Here's a > > handful of api breaking things that we might want to consider: > > > > * should we look at all the various configuration properties, and maybe > > some of them should get renamed for consistency / clarity? > > * do all of the functions on RDD need to be in core? or do some of them > > that are simple additions built on top of the primitives really belong > in a > > "utils" package or something? Eg., maybe we should get rid of all the > > variants of the mapPartitions / mapWith / etc. just have map, and > > mapPartitionsWithIndex (too many choices in the api can also be > confusing > > to the user) > > * are the right things getting tracked in SparkListener? Do we need to > add > > or remove anything? > > > > This is probably not the right list of questions, that's just an idea of > > the kind of thing we should be thinking about. > > > > Its also fine with me if 1.0 is next, I just think that we ought to be > > asking these kinds of questions up and down the entire api before we > > release 1.0. And given that we haven't even started that discussion, it > > seems possible that there could be new features we'd like to release in > > 0.10 before that discussion is finished. > > > > > > > > On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia > >wrote: > > > > > I think it's important to do 1.0 next. The project has been around for > 4 > > > years, and I'd be comfortable maintaining the current codebase for a > long > > > time in an API and binary compatible way through 1.x releases. Over the > > > past 4 years we haven't actually had major changes to the user-facing > > API -- > > > the only ones were changing the package to org.apache.spark, and > > upgrading > > > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for > > > example, or later cross-building it for Scala 2.11. Updating to 1.0 > says > > > two things: it tells users that they can be confident that version will > > be > > > maintained for a long time, which we absolutely want to do, and it lets > > > outsiders see that the project is now fairly mature (for many people, > > > pre-1.0 might still cause them not to try it). I think both are good > for > > > the community. > > > > > > Regarding binary compatibility, I agree that it's what we should strive > > > for, but it just seems premature to codify now. Let's see how it works > > > between, say, 1.0 and 1.1, and then we can codify it. > > > > > > Matei > > > > > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > > > wrote: > > > > > > > Thanks Patick to initiate the discussion about next road map for > Apache > > > Spark. > > > > > > > > I am +1 for 0.10.0 for next version. > > > > > > > > It will give us as community some time to digest the process and the > > > > vision and make adjustment accordingly. > > > > > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > > > somehow or modify internal behavior dramatically we could take > > > > advantage to release 1.0.0 as good step to go to. > > > > > > > > > > > > - Henry > > > > > > > > > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash > > wrote: > > > >> Agree on timeboxed releases as well. > > > >> > > > >> Is there a vision for where we want to be as a project before > > declaring > > > the > > > >> first 1.0 release? While we're in the 0.x days per semver we can > > break > > > >> backcompat at will (though we try to avoid it where possible), and > > that > > > >> luxury goes away with 1.x I just don't want to release a 1.0 simply > > > >
Re: Proposal for Spark Release Strategy
I think these are good questions to bring up, Imran. Here are my thoughts on them (I’ve thought about some of these in the past): On Feb 6, 2014, at 12:39 PM, Imran Rashid wrote: > I don't really agree with this logic. I think we haven't broken API so far > because we just keep adding stuff on to it, and we haven't bothered to > clean the api up, specifically to *avoid* breaking things. Here's a > handful of api breaking things that we might want to consider: > > * should we look at all the various configuration properties, and maybe > some of them should get renamed for consistency / clarity? I know that some names are suboptimal, but I absolutely detest breaking APIs, config names, etc. I’ve seen it happen way too often in other projects (even things we depend on that are officially post-1.0, like Akka or Protobuf or Hadoop), and it’s very painful. I think that we as fairly cutting-edge users are okay with libraries occasionally changing, but many others will consider it a show-stopper. Given this, I think that any cosmetic change now, even though it might improve clarity slightly, is not worth the tradeoff in terms of creating an update barrier for existing users. > * do all of the functions on RDD need to be in core? or do some of them > that are simple additions built on top of the primitives really belong in a > "utils" package or something? Eg., maybe we should get rid of all the > variants of the mapPartitions / mapWith / etc. just have map, and > mapPartitionsWithIndex (too many choices in the api can also be confusing > to the user) Again, for the reason above, I’d keep them where they are and consider adding other stuff later. Also personally I want to optimize the API for usability, not for Spark developers. If it’s easier for a user to call RDD.mapPartitions instead of AdvancedUtils.mapPartitions(rdd, func), and the only cost is a longer RDD.scala class, I’d go for the former. If you think there are some API methods that should just go away, that would be good to discuss — we can deprecate them for example. > * are the right things getting tracked in SparkListener? Do we need to add > or remove anything? This is an API that will probably be experimental or semi-private at first. Anyway, as I said, these are good questions — I’d be happy to see suggestions on any of these fronts. I just wanted to point out the importance of compatibility. I think it’s been awesome that most of our users have been able to keep up with the latest version of Spark, getting all the new fixes and simultaneously increasing the amount of contributions we get on master and decreasing the backporting burden on old branches. We might take it for granted, but I’ve seen similar projects that didn't manage to do this. In particular, compatibility in Hadoop has been a mess, with some major users diverging from Apache early (e.g. Facebook) and never being able to contribute back, and with big API cleanups (e.g. mapred -> mapreduce) being proposed after the project already had a lot of momentum and never making it through. The experience of seeing those has made me very conservative. The longer we can keep a unified community, the better it will be for all users of the project. Matei > > This is probably not the right list of questions, that's just an idea of > the kind of thing we should be thinking about. > > Its also fine with me if 1.0 is next, I just think that we ought to be > asking these kinds of questions up and down the entire api before we > release 1.0. And given that we haven't even started that discussion, it > seems possible that there could be new features we'd like to release in > 0.10 before that discussion is finished. > > > > On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia wrote: > >> I think it's important to do 1.0 next. The project has been around for 4 >> years, and I'd be comfortable maintaining the current codebase for a long >> time in an API and binary compatible way through 1.x releases. Over the >> past 4 years we haven't actually had major changes to the user-facing API -- >> the only ones were changing the package to org.apache.spark, and upgrading >> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for >> example, or later cross-building it for Scala 2.11. Updating to 1.0 says >> two things: it tells users that they can be confident that version will be >> maintained for a long time, which we absolutely want to do, and it lets >> outsiders see that the project is now fairly mature (for many people, >> pre-1.0 might still cause them not to try it). I think both are good for >> the community. >> >> Regarding binary compatibility, I agree that it's what we should strive >> for, but it just seems premature to codify now. Let's see how it works >> between, say, 1.0 and 1.1, and then we can codify it. >> >> Matei >> >> On Feb 6, 2014, at 10:43 AM, Henry Saputra >> wrote: >> >>> Thanks Patick to initiate the discussion about ne
Re: Proposal for Spark Release Strategy
Imran: > Its also fine with me if 1.0 is next, I just think that we ought to be > asking these kinds of questions up and down the entire api before we > release 1.0. And moving master to 1.0.0-SNAPSHOT doesn't preclude that. If anything, it turns that "ought to" into "must" -- which is another way of saying what Reynold said: "The point of 1.0 is for us to self-enforce API compatibility in the context of longer term support. If we continue down the 0.xx road, we will always have excuse for breaking APIs." 1.0.0-SNAPSHOT doesn't mean that the API is final right now. It means that what is released next will be final over what is intended to be the lengthy scope of a major release. That means that adding new features and functionality (at least to core spark) should be a very low priority for this development cycle, and establishing the 1.0 API from what is already in 0.9.0 should be our first priority. It wouldn't trouble me at all if not-strictly-necessary new features were left to hang out on the pull request queue for quite awhile until we are ready to add them in 1.1.0, if we were to do pretty much nothing else during this cycle except to get the 1.0 API to where most of us agree that it is in good shape. If we're not adding new features and extending the 0.9.0 API, then there really is no need for a 0.10.0 minor-release, whose main purpose would be to collect the API additions from 0.9.0. Bug-fixes go in 0.9.1-SNAPSHOT; bug-fixes and finalized 1.0 API go in 1.0.0-SNAPSHOT; almost all new features are put on hold and wait for 1.1.0-SNAPSHOT. ... it seems possible that there could be new features we'd like to release > in 0.10... We certainly can add new features to 1.0.0, but they will have to go through a rigorous review to be certain that they are things that we really want to commit to keeping going forward. But after 1.0, that is true for any new feature proposal unless we create specifically experimental branches. So what moving to 1.0.0-SNAPSHOT really means is that we are saying that we have gone beyond the development phase where more-or-less experimental features can be added to Spark releases only to be withdrawn later -- that time is done after 1.0.0-SNAPSHOT. Now to be fair, tentative/experimental features have not been added willy-nilly to Spark over recent releases, and withdrawal/replacement has been about as limited in scope as could be fairly expected, so this shouldn't be a radically new and different development paradigm. There are, though, some experiments that were added in the past and should probably now be withdrawn (or at least deprecated in 1.0.0, withdrawn in 1.1.0.) I'll put my own contribution of mapWith, filterWith, et. al on the chopping block as an effort that, at least in its present form, doesn't provide enough extra over mapPartitionsWithIndex, and whose syntax is awkward enough that I don't believe these methods have ever been widely used, so that their inclusion in the 1.0 API is probably not warranted. There are other elements of Spark that also should be culled and/or refactored before 1.0. Imran has listed a few. I'll also suggest that there are at least parts of alternative Broadcast variable implementations that should probably be left behind. In any event, Imran is absolutely correct that we need to have a discussion about these issues. Moving to 1.0.0-SNAPSHOT forces us to begin that discussion. So, I'm +1 for 1.0.0-incubating-SNAPSHOT (and looking forward to losing the "incubating"!) On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid wrote: > I don't really agree with this logic. I think we haven't broken API so far > because we just keep adding stuff on to it, and we haven't bothered to > clean the api up, specifically to *avoid* breaking things. Here's a > handful of api breaking things that we might want to consider: > > * should we look at all the various configuration properties, and maybe > some of them should get renamed for consistency / clarity? > * do all of the functions on RDD need to be in core? or do some of them > that are simple additions built on top of the primitives really belong in a > "utils" package or something? Eg., maybe we should get rid of all the > variants of the mapPartitions / mapWith / etc. just have map, and > mapPartitionsWithIndex (too many choices in the api can also be confusing > to the user) > * are the right things getting tracked in SparkListener? Do we need to add > or remove anything? > > This is probably not the right list of questions, that's just an idea of > the kind of thing we should be thinking about. > > Its also fine with me if 1.0 is next, I just think that we ought to be > asking these kinds of questions up and down the entire api before we > release 1.0. And given that we haven't even started that discussion, it > seems possible that there could be new features we'd like to release in > 0.10 before that discussion is finished. > > > > On Thu, Feb 6, 2014 at 12:56 PM, Matei Z
Re: Proposal for Spark Release Strategy
Just to echo others - The relevant question is whether we want to advertise stable API's for users that we will support for a long time horizon. And doing this is critical to being taken seriously as a mature project. The question is not whether or not there are things we want to improve about Spark (further reduce dependencies, runtime stability, etc) - of course everyone wants to improve those things! In the next few months ahead of 1.0 the plan would be to invest effort in finishing off loose ends in the API and of course, no 1.0 release candidate will pass muster if these aren't addressed. I only see a few fairly small blockers though wrt API issues: - We should mark things that may evolve and change as semi-private developer API's (e.g. the Spark Listener). - We need to standardize the Java API in a way that supports Java 8 lamdbas. Other than that - I don't see many blockers in terms of API changes we might want to make. A lot of those were dealt with in 0.9 specifically to prepare for this. The broader question API "clean-up" brings up a debate about the trade off of compatibility with older pre-1.0 versions of Spark. This is not the primary issue under discussion and can be debated separably. The primary issue at hand is whether to have 1.0 in ~3 months vs pushing it to ~6 months from now or more. - Patrick On Thu, Feb 6, 2014 at 12:49 PM, Sandy Ryza wrote: > If the APIs are usable, stability and continuity are much more important > than perfection. With many already relying on the current APIs, I think > trying to clean them up will just cause pain for users and integrators. > Hadoop made this mistake when they decided the original MapReduce APIs > were ugly and introduced a new set of APIs to do the same thing. Even > though this happened in a pre-1.0 release, three years down the road, both > the old and new APIs are still supported, causing endless confusion for > users. If individual functions or configuration properties have unclear > names, they can be deprecated and replaced, but redoing the APIs or > breaking compatibility at this point is simply not worth it. > > > On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid wrote: > >> I don't really agree with this logic. I think we haven't broken API so far >> because we just keep adding stuff on to it, and we haven't bothered to >> clean the api up, specifically to *avoid* breaking things. Here's a >> handful of api breaking things that we might want to consider: >> >> * should we look at all the various configuration properties, and maybe >> some of them should get renamed for consistency / clarity? >> * do all of the functions on RDD need to be in core? or do some of them >> that are simple additions built on top of the primitives really belong in a >> "utils" package or something? Eg., maybe we should get rid of all the >> variants of the mapPartitions / mapWith / etc. just have map, and >> mapPartitionsWithIndex (too many choices in the api can also be confusing >> to the user) >> * are the right things getting tracked in SparkListener? Do we need to add >> or remove anything? >> >> This is probably not the right list of questions, that's just an idea of >> the kind of thing we should be thinking about. >> >> Its also fine with me if 1.0 is next, I just think that we ought to be >> asking these kinds of questions up and down the entire api before we >> release 1.0. And given that we haven't even started that discussion, it >> seems possible that there could be new features we'd like to release in >> 0.10 before that discussion is finished. >> >> >> >> On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia > >wrote: >> >> > I think it's important to do 1.0 next. The project has been around for 4 >> > years, and I'd be comfortable maintaining the current codebase for a long >> > time in an API and binary compatible way through 1.x releases. Over the >> > past 4 years we haven't actually had major changes to the user-facing >> API -- >> > the only ones were changing the package to org.apache.spark, and >> upgrading >> > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for >> > example, or later cross-building it for Scala 2.11. Updating to 1.0 says >> > two things: it tells users that they can be confident that version will >> be >> > maintained for a long time, which we absolutely want to do, and it lets >> > outsiders see that the project is now fairly mature (for many people, >> > pre-1.0 might still cause them not to try it). I think both are good for >> > the community. >> > >> > Regarding binary compatibility, I agree that it's what we should strive >> > for, but it just seems premature to codify now. Let's see how it works >> > between, say, 1.0 and 1.1, and then we can codify it. >> > >> > Matei >> > >> > On Feb 6, 2014, at 10:43 AM, Henry Saputra >> > wrote: >> > >> > > Thanks Patick to initiate the discussion about next road map for Apache >> > Spark. >> > > >> > > I am +1 for 0.10.0 for next version. >>
Re: Proposal for Spark Release Strategy
If the APIs are usable, stability and continuity are much more important than perfection. With many already relying on the current APIs, I think trying to clean them up will just cause pain for users and integrators. Hadoop made this mistake when they decided the original MapReduce APIs were ugly and introduced a new set of APIs to do the same thing. Even though this happened in a pre-1.0 release, three years down the road, both the old and new APIs are still supported, causing endless confusion for users. If individual functions or configuration properties have unclear names, they can be deprecated and replaced, but redoing the APIs or breaking compatibility at this point is simply not worth it. On Thu, Feb 6, 2014 at 12:39 PM, Imran Rashid wrote: > I don't really agree with this logic. I think we haven't broken API so far > because we just keep adding stuff on to it, and we haven't bothered to > clean the api up, specifically to *avoid* breaking things. Here's a > handful of api breaking things that we might want to consider: > > * should we look at all the various configuration properties, and maybe > some of them should get renamed for consistency / clarity? > * do all of the functions on RDD need to be in core? or do some of them > that are simple additions built on top of the primitives really belong in a > "utils" package or something? Eg., maybe we should get rid of all the > variants of the mapPartitions / mapWith / etc. just have map, and > mapPartitionsWithIndex (too many choices in the api can also be confusing > to the user) > * are the right things getting tracked in SparkListener? Do we need to add > or remove anything? > > This is probably not the right list of questions, that's just an idea of > the kind of thing we should be thinking about. > > Its also fine with me if 1.0 is next, I just think that we ought to be > asking these kinds of questions up and down the entire api before we > release 1.0. And given that we haven't even started that discussion, it > seems possible that there could be new features we'd like to release in > 0.10 before that discussion is finished. > > > > On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia >wrote: > > > I think it's important to do 1.0 next. The project has been around for 4 > > years, and I'd be comfortable maintaining the current codebase for a long > > time in an API and binary compatible way through 1.x releases. Over the > > past 4 years we haven't actually had major changes to the user-facing > API -- > > the only ones were changing the package to org.apache.spark, and > upgrading > > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for > > example, or later cross-building it for Scala 2.11. Updating to 1.0 says > > two things: it tells users that they can be confident that version will > be > > maintained for a long time, which we absolutely want to do, and it lets > > outsiders see that the project is now fairly mature (for many people, > > pre-1.0 might still cause them not to try it). I think both are good for > > the community. > > > > Regarding binary compatibility, I agree that it's what we should strive > > for, but it just seems premature to codify now. Let's see how it works > > between, say, 1.0 and 1.1, and then we can codify it. > > > > Matei > > > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > > wrote: > > > > > Thanks Patick to initiate the discussion about next road map for Apache > > Spark. > > > > > > I am +1 for 0.10.0 for next version. > > > > > > It will give us as community some time to digest the process and the > > > vision and make adjustment accordingly. > > > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > > somehow or modify internal behavior dramatically we could take > > > advantage to release 1.0.0 as good step to go to. > > > > > > > > > - Henry > > > > > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash > wrote: > > >> Agree on timeboxed releases as well. > > >> > > >> Is there a vision for where we want to be as a project before > declaring > > the > > >> first 1.0 release? While we're in the 0.x days per semver we can > break > > >> backcompat at will (though we try to avoid it where possible), and > that > > >> luxury goes away with 1.x I just don't want to release a 1.0 simply > > >> because it seems to follow after 0.9 rather than making an intentional > > >> decision that we're at the point where we can stand by the current > APIs > > and > > >> binary compatibility for the next year or so of the major release. > > >> > > >> Until that decision is made as a group I'd rather we do an immediate > > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it > > later, > > >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to > 1.0 > > >> but not the other way around. > > >> > > >> https://github.com/apache/incubator-spark/pull/542 > > >> > > >> Cheers! > > >> Andrew > > >> > > >> > > >> On Wed, Feb 5, 2014 at
Re: Proposal for Spark Release Strategy
I don't really agree with this logic. I think we haven't broken API so far because we just keep adding stuff on to it, and we haven't bothered to clean the api up, specifically to *avoid* breaking things. Here's a handful of api breaking things that we might want to consider: * should we look at all the various configuration properties, and maybe some of them should get renamed for consistency / clarity? * do all of the functions on RDD need to be in core? or do some of them that are simple additions built on top of the primitives really belong in a "utils" package or something? Eg., maybe we should get rid of all the variants of the mapPartitions / mapWith / etc. just have map, and mapPartitionsWithIndex (too many choices in the api can also be confusing to the user) * are the right things getting tracked in SparkListener? Do we need to add or remove anything? This is probably not the right list of questions, that's just an idea of the kind of thing we should be thinking about. Its also fine with me if 1.0 is next, I just think that we ought to be asking these kinds of questions up and down the entire api before we release 1.0. And given that we haven't even started that discussion, it seems possible that there could be new features we'd like to release in 0.10 before that discussion is finished. On Thu, Feb 6, 2014 at 12:56 PM, Matei Zaharia wrote: > I think it's important to do 1.0 next. The project has been around for 4 > years, and I'd be comfortable maintaining the current codebase for a long > time in an API and binary compatible way through 1.x releases. Over the > past 4 years we haven't actually had major changes to the user-facing API -- > the only ones were changing the package to org.apache.spark, and upgrading > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for > example, or later cross-building it for Scala 2.11. Updating to 1.0 says > two things: it tells users that they can be confident that version will be > maintained for a long time, which we absolutely want to do, and it lets > outsiders see that the project is now fairly mature (for many people, > pre-1.0 might still cause them not to try it). I think both are good for > the community. > > Regarding binary compatibility, I agree that it's what we should strive > for, but it just seems premature to codify now. Let's see how it works > between, say, 1.0 and 1.1, and then we can codify it. > > Matei > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > wrote: > > > Thanks Patick to initiate the discussion about next road map for Apache > Spark. > > > > I am +1 for 0.10.0 for next version. > > > > It will give us as community some time to digest the process and the > > vision and make adjustment accordingly. > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > somehow or modify internal behavior dramatically we could take > > advantage to release 1.0.0 as good step to go to. > > > > > > - Henry > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash wrote: > >> Agree on timeboxed releases as well. > >> > >> Is there a vision for where we want to be as a project before declaring > the > >> first 1.0 release? While we're in the 0.x days per semver we can break > >> backcompat at will (though we try to avoid it where possible), and that > >> luxury goes away with 1.x I just don't want to release a 1.0 simply > >> because it seems to follow after 0.9 rather than making an intentional > >> decision that we're at the point where we can stand by the current APIs > and > >> binary compatibility for the next year or so of the major release. > >> > >> Until that decision is made as a group I'd rather we do an immediate > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it > later, > >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 > >> but not the other way around. > >> > >> https://github.com/apache/incubator-spark/pull/542 > >> > >> Cheers! > >> Andrew > >> > >> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >wrote: > >> > >>> +1 on time boxed releases and compatibility guidelines > >>> > >>> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of the development process. The next release > of Spark will likely be Spark 1.0.0, so this message is intended in > part to coordinate the release plan for 1.0.0 and future releases. > I'll post this on the wiki after discussing it on this thread as > tentative project guidelines. > > == Spark Release Structure == > Starting with Spark 1.0.0, the Spark project will follow the semantic > versioning guidelines (http://semver.org/) with a few deviations. > These small differences account for Spark's nature as a multi-module > project.
Re: Proposal for Spark Release Strategy
+1 for 1.0 The point of 1.0 is for us to self-enforce API compatibility in the context of longer term support. If we continue down the 0.xx road, we will always have excuse for breaking APIs. That said, a major focus of 0.9 and some of the work that are happening for 1.0 (e.g. configuration, Java 8 closure support, security) are for better API compatibility support in 1.x releases. While not perfect, Spark as is is already more mature than many (ASF) projects that are versioned 1.x, 2.x, or even 10.x. Software releases are always a moving target. 1.0 doesn't mean it is "perfect" and "final". The project will still evolve. On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan wrote: > +1 for 0.10.0. > > It would give more time to study things (such as the new SparkConf) > and let the community decide if any breaking API changes are needed. > > Also, a +1 for minor revisions not breaking code compatibility, > including Scala versions. (I guess this would mean that 1.x would > stay on Scala 2.10.x) > > On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza > wrote: > > Bleh, hit send to early again. My second paragraph was to argue for > 1.0.0 > > instead of 0.10.0, not to hammer on the binary compatibility point. > > > > > > On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza > wrote: > > > >> *Would it make sense to put in something that strongly discourages > binary > >> incompatible changes when possible? > >> > >> > >> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza >wrote: > >> > >>> Not codifying binary compatibility as a hard rule sounds fine to me. > >>> Would it make sense to put something in that . I.e. avoid making > needless > >>> changes to class hierarchies. > >>> > >>> Whether Spark considers itself stable or not, users are beginning to > >>> treat it so. A responsible project will acknowledge this and provide > the > >>> stability needed by its user base. I think some projects have made the > >>> mistake of waiting too long to release a 1.0.0. It allows them to put > off > >>> making the hard decisions, but users and downstream projects suffer. > >>> > >>> If Spark needs to go through dramatic changes, there's always the > option > >>> of a 2.0.0 that allows for this. > >>> > >>> -Sandy > >>> > >>> > >>> > >>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia < > matei.zaha...@gmail.com>wrote: > >>> > I think it's important to do 1.0 next. The project has been around > for 4 > years, and I'd be comfortable maintaining the current codebase for a > long > time in an API and binary compatible way through 1.x releases. Over > the > past 4 years we haven't actually had major changes to the user-facing > API -- > the only ones were changing the package to org.apache.spark, and > upgrading > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 > for > example, or later cross-building it for Scala 2.11. Updating to 1.0 > says > two things: it tells users that they can be confident that version > will be > maintained for a long time, which we absolutely want to do, and it > lets > outsiders see that the project is now fairly mature (for many people, > pre-1.0 might still cause them not to try it). I think both are good > for > the community. > > Regarding binary compatibility, I agree that it's what we should > strive > for, but it just seems premature to codify now. Let's see how it works > between, say, 1.0 and 1.1, and then we can codify it. > > Matei > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > wrote: > > > Thanks Patick to initiate the discussion about next road map for > Apache Spark. > > > > I am +1 for 0.10.0 for next version. > > > > It will give us as community some time to digest the process and the > > vision and make adjustment accordingly. > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > somehow or modify internal behavior dramatically we could take > > advantage to release 1.0.0 as good step to go to. > > > > > > - Henry > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash > wrote: > >> Agree on timeboxed releases as well. > >> > >> Is there a vision for where we want to be as a project before > declaring the > >> first 1.0 release? While we're in the 0.x days per semver we can > break > >> backcompat at will (though we try to avoid it where possible), and > that > >> luxury goes away with 1.x I just don't want to release a 1.0 > simply > >> because it seems to follow after 0.9 rather than making an > intentional > >> decision that we're at the point where we can stand by the current > APIs and > >> binary compatibility for the next year or so of the major release. > >> > >> Until that decision is made as a group I'd rather we do an > immediate > >> version bump to 0
Re: Proposal for Spark Release Strategy
On Feb 6, 2014, at 11:56 AM, Evan Chan wrote: > The other reason for waiting are things like stability. > > It would be great to have as a goal for 1.0.0 that under most heavy > use scenarios, workers and executors don't just die, which is not true > today. > Also, there should be minimal "silent failures" which are difficult to debug. > I think this is orthogonal to the version number. 1.x versions can have bugs — it’s almost unavoidable in the distributed system space. The version number is more about the level of compatibility and support people can expect, which I think is something we want to solidify. Calling it 1.x will also make it more likely that we have long-term maintenance releases, because with the current project, people expect that they have to keep jumping to the latest version. Just as an example, when we did a survey a while back, out of ~100 respondents, all were either on the very latest release or on master (!). I’ve had multiple people ask me about longer-term supported versions (e.g. if I download 1.x now, will it still have maintenance releases a year from now, or will it be left in the dust). Matei
Re: Proposal for Spark Release Strategy
On Feb 6, 2014, at 11:04 AM, Sandy Ryza wrote: > *Would it make sense to put in something that strongly discourages binary > incompatible changes when possible? Yes, I like this idea. Let’s just say we’ll strive for this as much as possible and think about codifying it after some experience doing this. Matei
Re: Proposal for Spark Release Strategy
The other reason for waiting are things like stability. It would be great to have as a goal for 1.0.0 that under most heavy use scenarios, workers and executors don't just die, which is not true today. Also, there should be minimal "silent failures" which are difficult to debug. On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan wrote: > +1 for 0.10.0. > > It would give more time to study things (such as the new SparkConf) > and let the community decide if any breaking API changes are needed. > > Also, a +1 for minor revisions not breaking code compatibility, > including Scala versions. (I guess this would mean that 1.x would > stay on Scala 2.10.x) > > On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza wrote: >> Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 >> instead of 0.10.0, not to hammer on the binary compatibility point. >> >> >> On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza wrote: >> >>> *Would it make sense to put in something that strongly discourages binary >>> incompatible changes when possible? >>> >>> >>> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote: >>> Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia wrote: > I think it's important to do 1.0 next. The project has been around for 4 > years, and I'd be comfortable maintaining the current codebase for a long > time in an API and binary compatible way through 1.x releases. Over the > past 4 years we haven't actually had major changes to the user-facing API > -- > the only ones were changing the package to org.apache.spark, and upgrading > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for > example, or later cross-building it for Scala 2.11. Updating to 1.0 says > two things: it tells users that they can be confident that version will be > maintained for a long time, which we absolutely want to do, and it lets > outsiders see that the project is now fairly mature (for many people, > pre-1.0 might still cause them not to try it). I think both are good for > the community. > > Regarding binary compatibility, I agree that it's what we should strive > for, but it just seems premature to codify now. Let's see how it works > between, say, 1.0 and 1.1, and then we can codify it. > > Matei > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > wrote: > > > Thanks Patick to initiate the discussion about next road map for > Apache Spark. > > > > I am +1 for 0.10.0 for next version. > > > > It will give us as community some time to digest the process and the > > vision and make adjustment accordingly. > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > somehow or modify internal behavior dramatically we could take > > advantage to release 1.0.0 as good step to go to. > > > > > > - Henry > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash > wrote: > >> Agree on timeboxed releases as well. > >> > >> Is there a vision for where we want to be as a project before > declaring the > >> first 1.0 release? While we're in the 0.x days per semver we can > break > >> backcompat at will (though we try to avoid it where possible), and > that > >> luxury goes away with 1.x I just don't want to release a 1.0 simply > >> because it seems to follow after 0.9 rather than making an intentional > >> decision that we're at the point where we can stand by the current > APIs and > >> binary compatibility for the next year or so of the major release. > >> > >> Until that decision is made as a group I'd rather we do an immediate > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it > later, > >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to > 1.0 > >> but not the other way around. > >> > >> https://github.com/apache/incubator-spark/pull/542 > >> > >> Cheers! > >> Andrew > >> > >> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >wrote: > >> > >>> +1 on time boxed releases and compatibility guidelines > >>> > >>> > Am 06.02.2014 u
Re: Proposal for Spark Release Strategy
+1 for 0.10.0. It would give more time to study things (such as the new SparkConf) and let the community decide if any breaking API changes are needed. Also, a +1 for minor revisions not breaking code compatibility, including Scala versions. (I guess this would mean that 1.x would stay on Scala 2.10.x) On Thu, Feb 6, 2014 at 11:05 AM, Sandy Ryza wrote: > Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 > instead of 0.10.0, not to hammer on the binary compatibility point. > > > On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza wrote: > >> *Would it make sense to put in something that strongly discourages binary >> incompatible changes when possible? >> >> >> On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote: >> >>> Not codifying binary compatibility as a hard rule sounds fine to me. >>> Would it make sense to put something in that . I.e. avoid making needless >>> changes to class hierarchies. >>> >>> Whether Spark considers itself stable or not, users are beginning to >>> treat it so. A responsible project will acknowledge this and provide the >>> stability needed by its user base. I think some projects have made the >>> mistake of waiting too long to release a 1.0.0. It allows them to put off >>> making the hard decisions, but users and downstream projects suffer. >>> >>> If Spark needs to go through dramatic changes, there's always the option >>> of a 2.0.0 that allows for this. >>> >>> -Sandy >>> >>> >>> >>> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia >>> wrote: >>> I think it's important to do 1.0 next. The project has been around for 4 years, and I'd be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven't actually had major changes to the user-facing API -- the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it's what we should strive for, but it just seems premature to codify now. Let's see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra wrote: > Thanks Patick to initiate the discussion about next road map for Apache Spark. > > I am +1 for 0.10.0 for next version. > > It will give us as community some time to digest the process and the > vision and make adjustment accordingly. > > Release a 1.0.0 is a huge milestone and if we do need to break API > somehow or modify internal behavior dramatically we could take > advantage to release 1.0.0 as good step to go to. > > > - Henry > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash wrote: >> Agree on timeboxed releases as well. >> >> Is there a vision for where we want to be as a project before declaring the >> first 1.0 release? While we're in the 0.x days per semver we can break >> backcompat at will (though we try to avoid it where possible), and that >> luxury goes away with 1.x I just don't want to release a 1.0 simply >> because it seems to follow after 0.9 rather than making an intentional >> decision that we're at the point where we can stand by the current APIs and >> binary compatibility for the next year or so of the major release. >> >> Until that decision is made as a group I'd rather we do an immediate >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 >> but not the other way around. >> >> https://github.com/apache/incubator-spark/pull/542 >> >> Cheers! >> Andrew >> >> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >>> >wrote: >> >>> +1 on time boxed releases and compatibility guidelines >>> >>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell >>> >: Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and
Re: Proposal for Spark Release Strategy
Not codifying binary compatibility as a hard rule sounds fine to me. Would it make sense to put something in that . I.e. avoid making needless changes to class hierarchies. Whether Spark considers itself stable or not, users are beginning to treat it so. A responsible project will acknowledge this and provide the stability needed by its user base. I think some projects have made the mistake of waiting too long to release a 1.0.0. It allows them to put off making the hard decisions, but users and downstream projects suffer. If Spark needs to go through dramatic changes, there's always the option of a 2.0.0 that allows for this. -Sandy On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia wrote: > I think it's important to do 1.0 next. The project has been around for 4 > years, and I'd be comfortable maintaining the current codebase for a long > time in an API and binary compatible way through 1.x releases. Over the > past 4 years we haven't actually had major changes to the user-facing API -- > the only ones were changing the package to org.apache.spark, and upgrading > the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for > example, or later cross-building it for Scala 2.11. Updating to 1.0 says > two things: it tells users that they can be confident that version will be > maintained for a long time, which we absolutely want to do, and it lets > outsiders see that the project is now fairly mature (for many people, > pre-1.0 might still cause them not to try it). I think both are good for > the community. > > Regarding binary compatibility, I agree that it's what we should strive > for, but it just seems premature to codify now. Let's see how it works > between, say, 1.0 and 1.1, and then we can codify it. > > Matei > > On Feb 6, 2014, at 10:43 AM, Henry Saputra > wrote: > > > Thanks Patick to initiate the discussion about next road map for Apache > Spark. > > > > I am +1 for 0.10.0 for next version. > > > > It will give us as community some time to digest the process and the > > vision and make adjustment accordingly. > > > > Release a 1.0.0 is a huge milestone and if we do need to break API > > somehow or modify internal behavior dramatically we could take > > advantage to release 1.0.0 as good step to go to. > > > > > > - Henry > > > > > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash wrote: > >> Agree on timeboxed releases as well. > >> > >> Is there a vision for where we want to be as a project before declaring > the > >> first 1.0 release? While we're in the 0.x days per semver we can break > >> backcompat at will (though we try to avoid it where possible), and that > >> luxury goes away with 1.x I just don't want to release a 1.0 simply > >> because it seems to follow after 0.9 rather than making an intentional > >> decision that we're at the point where we can stand by the current APIs > and > >> binary compatibility for the next year or so of the major release. > >> > >> Until that decision is made as a group I'd rather we do an immediate > >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it > later, > >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 > >> but not the other way around. > >> > >> https://github.com/apache/incubator-spark/pull/542 > >> > >> Cheers! > >> Andrew > >> > >> > >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >wrote: > >> > >>> +1 on time boxed releases and compatibility guidelines > >>> > >>> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of the development process. The next release > of Spark will likely be Spark 1.0.0, so this message is intended in > part to coordinate the release plan for 1.0.0 and future releases. > I'll post this on the wiki after discussing it on this thread as > tentative project guidelines. > > == Spark Release Structure == > Starting with Spark 1.0.0, the Spark project will follow the semantic > versioning guidelines (http://semver.org/) with a few deviations. > These small differences account for Spark's nature as a multi-module > project. > > Each Spark release will be versioned: > [MAJOR].[MINOR].[MAINTENANCE] > > All releases with the same major version number will have API > compatibility, defined as [1]. Major version numbers will remain > stable over long periods of time. For instance, 1.X.Y may last 1 year > or more. > > Minor releases will typically contain new features and improvements. > The target frequency for minor releases is every 3-4 months. One > change we'd like to make is to announce fixed release dates and merge > windows for each release, to facilitate coordination. Each minor > release will have a merge window where n
Re: Proposal for Spark Release Strategy
*Would it make sense to put in something that strongly discourages binary incompatible changes when possible? On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote: > Not codifying binary compatibility as a hard rule sounds fine to me. > Would it make sense to put something in that . I.e. avoid making needless > changes to class hierarchies. > > Whether Spark considers itself stable or not, users are beginning to treat > it so. A responsible project will acknowledge this and provide the > stability needed by its user base. I think some projects have made the > mistake of waiting too long to release a 1.0.0. It allows them to put off > making the hard decisions, but users and downstream projects suffer. > > If Spark needs to go through dramatic changes, there's always the option > of a 2.0.0 that allows for this. > > -Sandy > > > > On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia wrote: > >> I think it's important to do 1.0 next. The project has been around for 4 >> years, and I'd be comfortable maintaining the current codebase for a long >> time in an API and binary compatible way through 1.x releases. Over the >> past 4 years we haven't actually had major changes to the user-facing API -- >> the only ones were changing the package to org.apache.spark, and upgrading >> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for >> example, or later cross-building it for Scala 2.11. Updating to 1.0 says >> two things: it tells users that they can be confident that version will be >> maintained for a long time, which we absolutely want to do, and it lets >> outsiders see that the project is now fairly mature (for many people, >> pre-1.0 might still cause them not to try it). I think both are good for >> the community. >> >> Regarding binary compatibility, I agree that it's what we should strive >> for, but it just seems premature to codify now. Let's see how it works >> between, say, 1.0 and 1.1, and then we can codify it. >> >> Matei >> >> On Feb 6, 2014, at 10:43 AM, Henry Saputra >> wrote: >> >> > Thanks Patick to initiate the discussion about next road map for Apache >> Spark. >> > >> > I am +1 for 0.10.0 for next version. >> > >> > It will give us as community some time to digest the process and the >> > vision and make adjustment accordingly. >> > >> > Release a 1.0.0 is a huge milestone and if we do need to break API >> > somehow or modify internal behavior dramatically we could take >> > advantage to release 1.0.0 as good step to go to. >> > >> > >> > - Henry >> > >> > >> > >> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash >> wrote: >> >> Agree on timeboxed releases as well. >> >> >> >> Is there a vision for where we want to be as a project before >> declaring the >> >> first 1.0 release? While we're in the 0.x days per semver we can break >> >> backcompat at will (though we try to avoid it where possible), and that >> >> luxury goes away with 1.x I just don't want to release a 1.0 simply >> >> because it seems to follow after 0.9 rather than making an intentional >> >> decision that we're at the point where we can stand by the current >> APIs and >> >> binary compatibility for the next year or so of the major release. >> >> >> >> Until that decision is made as a group I'd rather we do an immediate >> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it >> later, >> >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to >> 1.0 >> >> but not the other way around. >> >> >> >> https://github.com/apache/incubator-spark/pull/542 >> >> >> >> Cheers! >> >> Andrew >> >> >> >> >> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote: >> >> >> >>> +1 on time boxed releases and compatibility guidelines >> >>> >> >>> >> Am 06.02.2014 um 01:20 schrieb Patrick Wendell : >> >> Hi Everyone, >> >> In an effort to coordinate development amongst the growing list of >> Spark contributors, I've taken some time to write up a proposal to >> formalize various pieces of the development process. The next release >> of Spark will likely be Spark 1.0.0, so this message is intended in >> part to coordinate the release plan for 1.0.0 and future releases. >> I'll post this on the wiki after discussing it on this thread as >> tentative project guidelines. >> >> == Spark Release Structure == >> Starting with Spark 1.0.0, the Spark project will follow the semantic >> versioning guidelines (http://semver.org/) with a few deviations. >> These small differences account for Spark's nature as a multi-module >> project. >> >> Each Spark release will be versioned: >> [MAJOR].[MINOR].[MAINTENANCE] >> >> All releases with the same major version number will have API >> compatibility, defined as [1]. Major version numbers will remain >> stable over long periods of time. For instance, 1.X.Y may last 1 year >> or more. >> >> Minor releases will typically contain
Re: Proposal for Spark Release Strategy
Bleh, hit send to early again. My second paragraph was to argue for 1.0.0 instead of 0.10.0, not to hammer on the binary compatibility point. On Thu, Feb 6, 2014 at 11:04 AM, Sandy Ryza wrote: > *Would it make sense to put in something that strongly discourages binary > incompatible changes when possible? > > > On Thu, Feb 6, 2014 at 11:03 AM, Sandy Ryza wrote: > >> Not codifying binary compatibility as a hard rule sounds fine to me. >> Would it make sense to put something in that . I.e. avoid making needless >> changes to class hierarchies. >> >> Whether Spark considers itself stable or not, users are beginning to >> treat it so. A responsible project will acknowledge this and provide the >> stability needed by its user base. I think some projects have made the >> mistake of waiting too long to release a 1.0.0. It allows them to put off >> making the hard decisions, but users and downstream projects suffer. >> >> If Spark needs to go through dramatic changes, there's always the option >> of a 2.0.0 that allows for this. >> >> -Sandy >> >> >> >> On Thu, Feb 6, 2014 at 10:56 AM, Matei Zaharia >> wrote: >> >>> I think it's important to do 1.0 next. The project has been around for 4 >>> years, and I'd be comfortable maintaining the current codebase for a long >>> time in an API and binary compatible way through 1.x releases. Over the >>> past 4 years we haven't actually had major changes to the user-facing API -- >>> the only ones were changing the package to org.apache.spark, and upgrading >>> the Scala version. I'd be okay leaving 1.x to always use Scala 2.10 for >>> example, or later cross-building it for Scala 2.11. Updating to 1.0 says >>> two things: it tells users that they can be confident that version will be >>> maintained for a long time, which we absolutely want to do, and it lets >>> outsiders see that the project is now fairly mature (for many people, >>> pre-1.0 might still cause them not to try it). I think both are good for >>> the community. >>> >>> Regarding binary compatibility, I agree that it's what we should strive >>> for, but it just seems premature to codify now. Let's see how it works >>> between, say, 1.0 and 1.1, and then we can codify it. >>> >>> Matei >>> >>> On Feb 6, 2014, at 10:43 AM, Henry Saputra >>> wrote: >>> >>> > Thanks Patick to initiate the discussion about next road map for >>> Apache Spark. >>> > >>> > I am +1 for 0.10.0 for next version. >>> > >>> > It will give us as community some time to digest the process and the >>> > vision and make adjustment accordingly. >>> > >>> > Release a 1.0.0 is a huge milestone and if we do need to break API >>> > somehow or modify internal behavior dramatically we could take >>> > advantage to release 1.0.0 as good step to go to. >>> > >>> > >>> > - Henry >>> > >>> > >>> > >>> > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash >>> wrote: >>> >> Agree on timeboxed releases as well. >>> >> >>> >> Is there a vision for where we want to be as a project before >>> declaring the >>> >> first 1.0 release? While we're in the 0.x days per semver we can >>> break >>> >> backcompat at will (though we try to avoid it where possible), and >>> that >>> >> luxury goes away with 1.x I just don't want to release a 1.0 simply >>> >> because it seems to follow after 0.9 rather than making an intentional >>> >> decision that we're at the point where we can stand by the current >>> APIs and >>> >> binary compatibility for the next year or so of the major release. >>> >> >>> >> Until that decision is made as a group I'd rather we do an immediate >>> >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it >>> later, >>> >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to >>> 1.0 >>> >> but not the other way around. >>> >> >>> >> https://github.com/apache/incubator-spark/pull/542 >>> >> >>> >> Cheers! >>> >> Andrew >>> >> >>> >> >>> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >> >wrote: >>> >> >>> >>> +1 on time boxed releases and compatibility guidelines >>> >>> >>> >>> >>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell >> >: >>> >>> Hi Everyone, >>> >>> In an effort to coordinate development amongst the growing list of >>> Spark contributors, I've taken some time to write up a proposal to >>> formalize various pieces of the development process. The next >>> release >>> of Spark will likely be Spark 1.0.0, so this message is intended in >>> part to coordinate the release plan for 1.0.0 and future releases. >>> I'll post this on the wiki after discussing it on this thread as >>> tentative project guidelines. >>> >>> == Spark Release Structure == >>> Starting with Spark 1.0.0, the Spark project will follow the >>> semantic >>> versioning guidelines (http://semver.org/) with a few deviations. >>> These small differences account for Spark's nature as a multi-module >>> project. >>> >>> Each Spark release will be versioned
Re: Proposal for Spark Release Strategy
I think it’s important to do 1.0 next. The project has been around for 4 years, and I’d be comfortable maintaining the current codebase for a long time in an API and binary compatible way through 1.x releases. Over the past 4 years we haven’t actually had major changes to the user-facing API — the only ones were changing the package to org.apache.spark, and upgrading the Scala version. I’d be okay leaving 1.x to always use Scala 2.10 for example, or later cross-building it for Scala 2.11. Updating to 1.0 says two things: it tells users that they can be confident that version will be maintained for a long time, which we absolutely want to do, and it lets outsiders see that the project is now fairly mature (for many people, pre-1.0 might still cause them not to try it). I think both are good for the community. Regarding binary compatibility, I agree that it’s what we should strive for, but it just seems premature to codify now. Let’s see how it works between, say, 1.0 and 1.1, and then we can codify it. Matei On Feb 6, 2014, at 10:43 AM, Henry Saputra wrote: > Thanks Patick to initiate the discussion about next road map for Apache Spark. > > I am +1 for 0.10.0 for next version. > > It will give us as community some time to digest the process and the > vision and make adjustment accordingly. > > Release a 1.0.0 is a huge milestone and if we do need to break API > somehow or modify internal behavior dramatically we could take > advantage to release 1.0.0 as good step to go to. > > > - Henry > > > > On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash wrote: >> Agree on timeboxed releases as well. >> >> Is there a vision for where we want to be as a project before declaring the >> first 1.0 release? While we're in the 0.x days per semver we can break >> backcompat at will (though we try to avoid it where possible), and that >> luxury goes away with 1.x I just don't want to release a 1.0 simply >> because it seems to follow after 0.9 rather than making an intentional >> decision that we're at the point where we can stand by the current APIs and >> binary compatibility for the next year or so of the major release. >> >> Until that decision is made as a group I'd rather we do an immediate >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 >> but not the other way around. >> >> https://github.com/apache/incubator-spark/pull/542 >> >> Cheers! >> Andrew >> >> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote: >> >>> +1 on time boxed releases and compatibility guidelines >>> >>> Am 06.02.2014 um 01:20 schrieb Patrick Wendell : Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When
Re: Proposal for Spark Release Strategy
Thanks Patick to initiate the discussion about next road map for Apache Spark. I am +1 for 0.10.0 for next version. It will give us as community some time to digest the process and the vision and make adjustment accordingly. Release a 1.0.0 is a huge milestone and if we do need to break API somehow or modify internal behavior dramatically we could take advantage to release 1.0.0 as good step to go to. - Henry On Wed, Feb 5, 2014 at 9:52 PM, Andrew Ash wrote: > Agree on timeboxed releases as well. > > Is there a vision for where we want to be as a project before declaring the > first 1.0 release? While we're in the 0.x days per semver we can break > backcompat at will (though we try to avoid it where possible), and that > luxury goes away with 1.x I just don't want to release a 1.0 simply > because it seems to follow after 0.9 rather than making an intentional > decision that we're at the point where we can stand by the current APIs and > binary compatibility for the next year or so of the major release. > > Until that decision is made as a group I'd rather we do an immediate > version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, > replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 > but not the other way around. > > https://github.com/apache/incubator-spark/pull/542 > > Cheers! > Andrew > > > On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote: > >> +1 on time boxed releases and compatibility guidelines >> >> >> > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : >> > >> > Hi Everyone, >> > >> > In an effort to coordinate development amongst the growing list of >> > Spark contributors, I've taken some time to write up a proposal to >> > formalize various pieces of the development process. The next release >> > of Spark will likely be Spark 1.0.0, so this message is intended in >> > part to coordinate the release plan for 1.0.0 and future releases. >> > I'll post this on the wiki after discussing it on this thread as >> > tentative project guidelines. >> > >> > == Spark Release Structure == >> > Starting with Spark 1.0.0, the Spark project will follow the semantic >> > versioning guidelines (http://semver.org/) with a few deviations. >> > These small differences account for Spark's nature as a multi-module >> > project. >> > >> > Each Spark release will be versioned: >> > [MAJOR].[MINOR].[MAINTENANCE] >> > >> > All releases with the same major version number will have API >> > compatibility, defined as [1]. Major version numbers will remain >> > stable over long periods of time. For instance, 1.X.Y may last 1 year >> > or more. >> > >> > Minor releases will typically contain new features and improvements. >> > The target frequency for minor releases is every 3-4 months. One >> > change we'd like to make is to announce fixed release dates and merge >> > windows for each release, to facilitate coordination. Each minor >> > release will have a merge window where new patches can be merged, a QA >> > window when only fixes can be merged, then a final period where voting >> > occurs on release candidates. These windows will be announced >> > immediately after the previous minor release to give people plenty of >> > time, and over time, we might make the whole release process more >> > regular (similar to Ubuntu). At the bottom of this document is an >> > example window for the 1.0.0 release. >> > >> > Maintenance releases will occur more frequently and depend on specific >> > patches introduced (e.g. bug fixes) and their urgency. In general >> > these releases are designed to patch bugs. However, higher level >> > libraries may introduce small features, such as a new algorithm, >> > provided they are entirely additive and isolated from existing code >> > paths. Spark core may not introduce any features. >> > >> > When new components are added to Spark, they may initially be marked >> > as "alpha". Alpha components do not have to abide by the above >> > guidelines, however, to the maximum extent possible, they should try >> > to. Once they are marked "stable" they have to follow these >> > guidelines. At present, GraphX is the only alpha component of Spark. >> > >> > [1] API compatibility: >> > >> > An API is any public class or interface exposed in Spark that is not >> > marked as semi-private or experimental. Release A is API compatible >> > with release B if code compiled against release A *compiles cleanly* >> > against B. This does not guarantee that a compiled application that is >> > linked against version A will link cleanly against version B without >> > re-compiling. Link-level compatibility is something we'll try to >> > guarantee that as well, and we might make it a requirement in the >> > future, but challenges with things like Scala versions have made this >> > difficult to guarantee in the past. >> > >> > == Merging Pull Requests == >> > To merge pull requests, committers are encouraged to use this tool [2] >> > to collapse the request into one comm
Re: Proposal for Spark Release Strategy
> I like Heiko's proposal that requires every pull request to reference a > JIRA. This is how things are done in Hadoop and it makes it much easier > to, for example, find out whether an issue you came across when googling > for an error is in a release. I think this is a good idea and something on which there is wide consensus. I separately was going to suggest this in a later e-mail (it's not directly tied to versioning). One of many reasons this is necessary is because it's becoming hard to track which features ended up in which releases. > I agree with Mridul about binary compatibility. It can be a dealbreaker > for organizations that are considering an upgrade. The two ways I'm aware > of that cause binary compatibility are scala version upgrades and messing > around with inheritance. Are these not avoidable at least for minor > releases? This is clearly a goal but I'm hesitant to codify it until we understand all of the reasons why it might not work. I've heard in general with Scala there are many non-obvious things that can break binary compatibility and we need to understand what they are. I'd propose we add the migration tool [1] here to our build and use it for a few months and see what happens (hat tip to Michael Armbrust). It's easy to formalize this as a requirement later, it's impossible to go the other direction. For Scala major versions it's possible we can cross-build between 2.10 and 2.11 to retain link-level compatibility. It's just entirely uncharted territory and AFAIK no one who's suggesting this is speaking from experience maintaining this guarantee for a Scala project. That would be the strongest convincing reason for me - if someone has actually done this in the past in a Scala project and speaks from experience. Most of use are speaking from the perspective of Java projects where we understand well the trade-off's and costs of maintaining this guarantee. [1] https://github.com/typesafehub/migration-manager - Patrick
Re: Proposal for Spark Release Strategy
Thanks for all this Patrick. I like Heiko's proposal that requires every pull request to reference a JIRA. This is how things are done in Hadoop and it makes it much easier to, for example, find out whether an issue you came across when googling for an error is in a release. I agree with Mridul about binary compatibility. It can be a dealbreaker for organizations that are considering an upgrade. The two ways I'm aware of that cause binary compatibility are scala version upgrades and messing around with inheritance. Are these not avoidable at least for minor releases? -Sandy On Thu, Feb 6, 2014 at 12:49 AM, Mridul Muralidharan wrote: > The reason I explicitly mentioned about binary compatibility was > because it was sort of hand waved in the proposal as good to have. > My understanding is that scala does make it painful to ensure binary > compatibility - but stability of interfaces is vital to ensure > dependable platforms. > Recompilation might be a viable option for developers - not for users. > > Regards, > Mridul > > > On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell > wrote: > > If people feel that merging the intermediate SNAPSHOT number is > > significant, let's just defer merging that until this discussion > > concludes. > > > > That said - the decision to settle on 1.0 for the next release is not > > just because it happens to come after 0.9. It's a conscientious > > decision based on the development of the project to this point. A > > major focus of the 0.9 release was tying off loose ends in terms of > > backwards compatibility (e.g. spark configuration). There was some > > discussion back then of maybe cutting a 1.0 release but the decision > > was deferred until after 0.9. > > > > @mridul - pleas see the original post for discussion about binary > compatibility. > > > > On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski > wrote: > >> +1 for 0.10.0 now with the option to switch to 1.0.0 after further > >> discussion. > >> On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: > >> > >>> Agree on timeboxed releases as well. > >>> > >>> Is there a vision for where we want to be as a project before > declaring the > >>> first 1.0 release? While we're in the 0.x days per semver we can break > >>> backcompat at will (though we try to avoid it where possible), and that > >>> luxury goes away with 1.x I just don't want to release a 1.0 simply > >>> because it seems to follow after 0.9 rather than making an intentional > >>> decision that we're at the point where we can stand by the current > APIs and > >>> binary compatibility for the next year or so of the major release. > >>> > >>> Until that decision is made as a group I'd rather we do an immediate > >>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it > later, > >>> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to > 1.0 > >>> but not the other way around. > >>> > >>> https://github.com/apache/incubator-spark/pull/542 > >>> > >>> Cheers! > >>> Andrew > >>> > >>> > >>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >>> >wrote: > >>> > >>> > +1 on time boxed releases and compatibility guidelines > >>> > > >>> > > >>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell >: > >>> > > > >>> > > Hi Everyone, > >>> > > > >>> > > In an effort to coordinate development amongst the growing list of > >>> > > Spark contributors, I've taken some time to write up a proposal to > >>> > > formalize various pieces of the development process. The next > release > >>> > > of Spark will likely be Spark 1.0.0, so this message is intended in > >>> > > part to coordinate the release plan for 1.0.0 and future releases. > >>> > > I'll post this on the wiki after discussing it on this thread as > >>> > > tentative project guidelines. > >>> > > > >>> > > == Spark Release Structure == > >>> > > Starting with Spark 1.0.0, the Spark project will follow the > semantic > >>> > > versioning guidelines (http://semver.org/) with a few deviations. > >>> > > These small differences account for Spark's nature as a > multi-module > >>> > > project. > >>> > > > >>> > > Each Spark release will be versioned: > >>> > > [MAJOR].[MINOR].[MAINTENANCE] > >>> > > > >>> > > All releases with the same major version number will have API > >>> > > compatibility, defined as [1]. Major version numbers will remain > >>> > > stable over long periods of time. For instance, 1.X.Y may last 1 > year > >>> > > or more. > >>> > > > >>> > > Minor releases will typically contain new features and > improvements. > >>> > > The target frequency for minor releases is every 3-4 months. One > >>> > > change we'd like to make is to announce fixed release dates and > merge > >>> > > windows for each release, to facilitate coordination. Each minor > >>> > > release will have a merge window where new patches can be merged, > a QA > >>> > > window when only fixes can be merged, then a final period where > voting > >>> > > occurs on release candidates. These windows will be announced > >
Re: Proposal for Spark Release Strategy
The reason I explicitly mentioned about binary compatibility was because it was sort of hand waved in the proposal as good to have. My understanding is that scala does make it painful to ensure binary compatibility - but stability of interfaces is vital to ensure dependable platforms. Recompilation might be a viable option for developers - not for users. Regards, Mridul On Thu, Feb 6, 2014 at 12:08 PM, Patrick Wendell wrote: > If people feel that merging the intermediate SNAPSHOT number is > significant, let's just defer merging that until this discussion > concludes. > > That said - the decision to settle on 1.0 for the next release is not > just because it happens to come after 0.9. It's a conscientious > decision based on the development of the project to this point. A > major focus of the 0.9 release was tying off loose ends in terms of > backwards compatibility (e.g. spark configuration). There was some > discussion back then of maybe cutting a 1.0 release but the decision > was deferred until after 0.9. > > @mridul - pleas see the original post for discussion about binary > compatibility. > > On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski > wrote: >> +1 for 0.10.0 now with the option to switch to 1.0.0 after further >> discussion. >> On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: >> >>> Agree on timeboxed releases as well. >>> >>> Is there a vision for where we want to be as a project before declaring the >>> first 1.0 release? While we're in the 0.x days per semver we can break >>> backcompat at will (though we try to avoid it where possible), and that >>> luxury goes away with 1.x I just don't want to release a 1.0 simply >>> because it seems to follow after 0.9 rather than making an intentional >>> decision that we're at the point where we can stand by the current APIs and >>> binary compatibility for the next year or so of the major release. >>> >>> Until that decision is made as a group I'd rather we do an immediate >>> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, >>> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 >>> but not the other way around. >>> >>> https://github.com/apache/incubator-spark/pull/542 >>> >>> Cheers! >>> Andrew >>> >>> >>> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >> >wrote: >>> >>> > +1 on time boxed releases and compatibility guidelines >>> > >>> > >>> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : >>> > > >>> > > Hi Everyone, >>> > > >>> > > In an effort to coordinate development amongst the growing list of >>> > > Spark contributors, I've taken some time to write up a proposal to >>> > > formalize various pieces of the development process. The next release >>> > > of Spark will likely be Spark 1.0.0, so this message is intended in >>> > > part to coordinate the release plan for 1.0.0 and future releases. >>> > > I'll post this on the wiki after discussing it on this thread as >>> > > tentative project guidelines. >>> > > >>> > > == Spark Release Structure == >>> > > Starting with Spark 1.0.0, the Spark project will follow the semantic >>> > > versioning guidelines (http://semver.org/) with a few deviations. >>> > > These small differences account for Spark's nature as a multi-module >>> > > project. >>> > > >>> > > Each Spark release will be versioned: >>> > > [MAJOR].[MINOR].[MAINTENANCE] >>> > > >>> > > All releases with the same major version number will have API >>> > > compatibility, defined as [1]. Major version numbers will remain >>> > > stable over long periods of time. For instance, 1.X.Y may last 1 year >>> > > or more. >>> > > >>> > > Minor releases will typically contain new features and improvements. >>> > > The target frequency for minor releases is every 3-4 months. One >>> > > change we'd like to make is to announce fixed release dates and merge >>> > > windows for each release, to facilitate coordination. Each minor >>> > > release will have a merge window where new patches can be merged, a QA >>> > > window when only fixes can be merged, then a final period where voting >>> > > occurs on release candidates. These windows will be announced >>> > > immediately after the previous minor release to give people plenty of >>> > > time, and over time, we might make the whole release process more >>> > > regular (similar to Ubuntu). At the bottom of this document is an >>> > > example window for the 1.0.0 release. >>> > > >>> > > Maintenance releases will occur more frequently and depend on specific >>> > > patches introduced (e.g. bug fixes) and their urgency. In general >>> > > these releases are designed to patch bugs. However, higher level >>> > > libraries may introduce small features, such as a new algorithm, >>> > > provided they are entirely additive and isolated from existing code >>> > > paths. Spark core may not introduce any features. >>> > > >>> > > When new components are added to Spark, they may initially be marked >>> > > as "alpha". Alpha components do not have to abide by t
Re: Proposal for Spark Release Strategy
If we could minimize the external dependencies, it would certainly be beneficial long term. > Am 06.02.2014 um 07:37 schrieb Mridul Muralidharan : > > > b) minimize external dependencies - some of them would go away/not be > actively maintained.
Re: Proposal for Spark Release Strategy
If people feel that merging the intermediate SNAPSHOT number is significant, let's just defer merging that until this discussion concludes. That said - the decision to settle on 1.0 for the next release is not just because it happens to come after 0.9. It's a conscientious decision based on the development of the project to this point. A major focus of the 0.9 release was tying off loose ends in terms of backwards compatibility (e.g. spark configuration). There was some discussion back then of maybe cutting a 1.0 release but the decision was deferred until after 0.9. @mridul - pleas see the original post for discussion about binary compatibility. On Wed, Feb 5, 2014 at 10:20 PM, Andy Konwinski wrote: > +1 for 0.10.0 now with the option to switch to 1.0.0 after further > discussion. > On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: > >> Agree on timeboxed releases as well. >> >> Is there a vision for where we want to be as a project before declaring the >> first 1.0 release? While we're in the 0.x days per semver we can break >> backcompat at will (though we try to avoid it where possible), and that >> luxury goes away with 1.x I just don't want to release a 1.0 simply >> because it seems to follow after 0.9 rather than making an intentional >> decision that we're at the point where we can stand by the current APIs and >> binary compatibility for the next year or so of the major release. >> >> Until that decision is made as a group I'd rather we do an immediate >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 >> but not the other way around. >> >> https://github.com/apache/incubator-spark/pull/542 >> >> Cheers! >> Andrew >> >> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote: >> >> > +1 on time boxed releases and compatibility guidelines >> > >> > >> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : >> > > >> > > Hi Everyone, >> > > >> > > In an effort to coordinate development amongst the growing list of >> > > Spark contributors, I've taken some time to write up a proposal to >> > > formalize various pieces of the development process. The next release >> > > of Spark will likely be Spark 1.0.0, so this message is intended in >> > > part to coordinate the release plan for 1.0.0 and future releases. >> > > I'll post this on the wiki after discussing it on this thread as >> > > tentative project guidelines. >> > > >> > > == Spark Release Structure == >> > > Starting with Spark 1.0.0, the Spark project will follow the semantic >> > > versioning guidelines (http://semver.org/) with a few deviations. >> > > These small differences account for Spark's nature as a multi-module >> > > project. >> > > >> > > Each Spark release will be versioned: >> > > [MAJOR].[MINOR].[MAINTENANCE] >> > > >> > > All releases with the same major version number will have API >> > > compatibility, defined as [1]. Major version numbers will remain >> > > stable over long periods of time. For instance, 1.X.Y may last 1 year >> > > or more. >> > > >> > > Minor releases will typically contain new features and improvements. >> > > The target frequency for minor releases is every 3-4 months. One >> > > change we'd like to make is to announce fixed release dates and merge >> > > windows for each release, to facilitate coordination. Each minor >> > > release will have a merge window where new patches can be merged, a QA >> > > window when only fixes can be merged, then a final period where voting >> > > occurs on release candidates. These windows will be announced >> > > immediately after the previous minor release to give people plenty of >> > > time, and over time, we might make the whole release process more >> > > regular (similar to Ubuntu). At the bottom of this document is an >> > > example window for the 1.0.0 release. >> > > >> > > Maintenance releases will occur more frequently and depend on specific >> > > patches introduced (e.g. bug fixes) and their urgency. In general >> > > these releases are designed to patch bugs. However, higher level >> > > libraries may introduce small features, such as a new algorithm, >> > > provided they are entirely additive and isolated from existing code >> > > paths. Spark core may not introduce any features. >> > > >> > > When new components are added to Spark, they may initially be marked >> > > as "alpha". Alpha components do not have to abide by the above >> > > guidelines, however, to the maximum extent possible, they should try >> > > to. Once they are marked "stable" they have to follow these >> > > guidelines. At present, GraphX is the only alpha component of Spark. >> > > >> > > [1] API compatibility: >> > > >> > > An API is any public class or interface exposed in Spark that is not >> > > marked as semi-private or experimental. Release A is API compatible >> > > with release B if code compiled against release A *compiles cleanly* >> > > against B. This does not guarantee tha
Re: Proposal for Spark Release Strategy
Before we move to 1.0, we need to address two things : a) backward compatibility not just at api level, but also at binary level (not forcing recompile). b) minimize external dependencies - some of them would go away/not be actively maintained. Regards, Mridul On Thu, Feb 6, 2014 at 11:50 AM, Andy Konwinski wrote: > +1 for 0.10.0 now with the option to switch to 1.0.0 after further > discussion. > On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: > >> Agree on timeboxed releases as well. >> >> Is there a vision for where we want to be as a project before declaring the >> first 1.0 release? While we're in the 0.x days per semver we can break >> backcompat at will (though we try to avoid it where possible), and that >> luxury goes away with 1.x I just don't want to release a 1.0 simply >> because it seems to follow after 0.9 rather than making an intentional >> decision that we're at the point where we can stand by the current APIs and >> binary compatibility for the next year or so of the major release. >> >> Until that decision is made as a group I'd rather we do an immediate >> version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, >> replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 >> but not the other way around. >> >> https://github.com/apache/incubator-spark/pull/542 >> >> Cheers! >> Andrew >> >> >> On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun > >wrote: >> >> > +1 on time boxed releases and compatibility guidelines >> > >> > >> > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : >> > > >> > > Hi Everyone, >> > > >> > > In an effort to coordinate development amongst the growing list of >> > > Spark contributors, I've taken some time to write up a proposal to >> > > formalize various pieces of the development process. The next release >> > > of Spark will likely be Spark 1.0.0, so this message is intended in >> > > part to coordinate the release plan for 1.0.0 and future releases. >> > > I'll post this on the wiki after discussing it on this thread as >> > > tentative project guidelines. >> > > >> > > == Spark Release Structure == >> > > Starting with Spark 1.0.0, the Spark project will follow the semantic >> > > versioning guidelines (http://semver.org/) with a few deviations. >> > > These small differences account for Spark's nature as a multi-module >> > > project. >> > > >> > > Each Spark release will be versioned: >> > > [MAJOR].[MINOR].[MAINTENANCE] >> > > >> > > All releases with the same major version number will have API >> > > compatibility, defined as [1]. Major version numbers will remain >> > > stable over long periods of time. For instance, 1.X.Y may last 1 year >> > > or more. >> > > >> > > Minor releases will typically contain new features and improvements. >> > > The target frequency for minor releases is every 3-4 months. One >> > > change we'd like to make is to announce fixed release dates and merge >> > > windows for each release, to facilitate coordination. Each minor >> > > release will have a merge window where new patches can be merged, a QA >> > > window when only fixes can be merged, then a final period where voting >> > > occurs on release candidates. These windows will be announced >> > > immediately after the previous minor release to give people plenty of >> > > time, and over time, we might make the whole release process more >> > > regular (similar to Ubuntu). At the bottom of this document is an >> > > example window for the 1.0.0 release. >> > > >> > > Maintenance releases will occur more frequently and depend on specific >> > > patches introduced (e.g. bug fixes) and their urgency. In general >> > > these releases are designed to patch bugs. However, higher level >> > > libraries may introduce small features, such as a new algorithm, >> > > provided they are entirely additive and isolated from existing code >> > > paths. Spark core may not introduce any features. >> > > >> > > When new components are added to Spark, they may initially be marked >> > > as "alpha". Alpha components do not have to abide by the above >> > > guidelines, however, to the maximum extent possible, they should try >> > > to. Once they are marked "stable" they have to follow these >> > > guidelines. At present, GraphX is the only alpha component of Spark. >> > > >> > > [1] API compatibility: >> > > >> > > An API is any public class or interface exposed in Spark that is not >> > > marked as semi-private or experimental. Release A is API compatible >> > > with release B if code compiled against release A *compiles cleanly* >> > > against B. This does not guarantee that a compiled application that is >> > > linked against version A will link cleanly against version B without >> > > re-compiling. Link-level compatibility is something we'll try to >> > > guarantee that as well, and we might make it a requirement in the >> > > future, but challenges with things like Scala versions have made this >> > > difficult to guarantee in the past. >> > > >> > > ==
Re: Proposal for Spark Release Strategy
+1 for 0.10.0 now with the option to switch to 1.0.0 after further discussion. On Feb 5, 2014 9:53 PM, "Andrew Ash" wrote: > Agree on timeboxed releases as well. > > Is there a vision for where we want to be as a project before declaring the > first 1.0 release? While we're in the 0.x days per semver we can break > backcompat at will (though we try to avoid it where possible), and that > luxury goes away with 1.x I just don't want to release a 1.0 simply > because it seems to follow after 0.9 rather than making an intentional > decision that we're at the point where we can stand by the current APIs and > binary compatibility for the next year or so of the major release. > > Until that decision is made as a group I'd rather we do an immediate > version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, > replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 > but not the other way around. > > https://github.com/apache/incubator-spark/pull/542 > > Cheers! > Andrew > > > On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun >wrote: > > > +1 on time boxed releases and compatibility guidelines > > > > > > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > > > > > Hi Everyone, > > > > > > In an effort to coordinate development amongst the growing list of > > > Spark contributors, I've taken some time to write up a proposal to > > > formalize various pieces of the development process. The next release > > > of Spark will likely be Spark 1.0.0, so this message is intended in > > > part to coordinate the release plan for 1.0.0 and future releases. > > > I'll post this on the wiki after discussing it on this thread as > > > tentative project guidelines. > > > > > > == Spark Release Structure == > > > Starting with Spark 1.0.0, the Spark project will follow the semantic > > > versioning guidelines (http://semver.org/) with a few deviations. > > > These small differences account for Spark's nature as a multi-module > > > project. > > > > > > Each Spark release will be versioned: > > > [MAJOR].[MINOR].[MAINTENANCE] > > > > > > All releases with the same major version number will have API > > > compatibility, defined as [1]. Major version numbers will remain > > > stable over long periods of time. For instance, 1.X.Y may last 1 year > > > or more. > > > > > > Minor releases will typically contain new features and improvements. > > > The target frequency for minor releases is every 3-4 months. One > > > change we'd like to make is to announce fixed release dates and merge > > > windows for each release, to facilitate coordination. Each minor > > > release will have a merge window where new patches can be merged, a QA > > > window when only fixes can be merged, then a final period where voting > > > occurs on release candidates. These windows will be announced > > > immediately after the previous minor release to give people plenty of > > > time, and over time, we might make the whole release process more > > > regular (similar to Ubuntu). At the bottom of this document is an > > > example window for the 1.0.0 release. > > > > > > Maintenance releases will occur more frequently and depend on specific > > > patches introduced (e.g. bug fixes) and their urgency. In general > > > these releases are designed to patch bugs. However, higher level > > > libraries may introduce small features, such as a new algorithm, > > > provided they are entirely additive and isolated from existing code > > > paths. Spark core may not introduce any features. > > > > > > When new components are added to Spark, they may initially be marked > > > as "alpha". Alpha components do not have to abide by the above > > > guidelines, however, to the maximum extent possible, they should try > > > to. Once they are marked "stable" they have to follow these > > > guidelines. At present, GraphX is the only alpha component of Spark. > > > > > > [1] API compatibility: > > > > > > An API is any public class or interface exposed in Spark that is not > > > marked as semi-private or experimental. Release A is API compatible > > > with release B if code compiled against release A *compiles cleanly* > > > against B. This does not guarantee that a compiled application that is > > > linked against version A will link cleanly against version B without > > > re-compiling. Link-level compatibility is something we'll try to > > > guarantee that as well, and we might make it a requirement in the > > > future, but challenges with things like Scala versions have made this > > > difficult to guarantee in the past. > > > > > > == Merging Pull Requests == > > > To merge pull requests, committers are encouraged to use this tool [2] > > > to collapse the request into one commit rather than manually > > > performing git merges. It will also format the commit message nicely > > > in a way that can be easily parsed later when writing credits. > > > Currently it is maintained in a public utility repository, but we'll > > > merge it into mainline Spar
Re: Proposal for Spark Release Strategy
Agree on timeboxed releases as well. Is there a vision for where we want to be as a project before declaring the first 1.0 release? While we're in the 0.x days per semver we can break backcompat at will (though we try to avoid it where possible), and that luxury goes away with 1.x I just don't want to release a 1.0 simply because it seems to follow after 0.9 rather than making an intentional decision that we're at the point where we can stand by the current APIs and binary compatibility for the next year or so of the major release. Until that decision is made as a group I'd rather we do an immediate version bump to 0.10.0-SNAPSHOT and then if discussion warrants it later, replace that with 1.0.0-SNAPSHOT. It's very easy to go from 0.10 to 1.0 but not the other way around. https://github.com/apache/incubator-spark/pull/542 Cheers! Andrew On Wed, Feb 5, 2014 at 9:49 PM, Heiko Braun wrote: > +1 on time boxed releases and compatibility guidelines > > > > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > > > Hi Everyone, > > > > In an effort to coordinate development amongst the growing list of > > Spark contributors, I've taken some time to write up a proposal to > > formalize various pieces of the development process. The next release > > of Spark will likely be Spark 1.0.0, so this message is intended in > > part to coordinate the release plan for 1.0.0 and future releases. > > I'll post this on the wiki after discussing it on this thread as > > tentative project guidelines. > > > > == Spark Release Structure == > > Starting with Spark 1.0.0, the Spark project will follow the semantic > > versioning guidelines (http://semver.org/) with a few deviations. > > These small differences account for Spark's nature as a multi-module > > project. > > > > Each Spark release will be versioned: > > [MAJOR].[MINOR].[MAINTENANCE] > > > > All releases with the same major version number will have API > > compatibility, defined as [1]. Major version numbers will remain > > stable over long periods of time. For instance, 1.X.Y may last 1 year > > or more. > > > > Minor releases will typically contain new features and improvements. > > The target frequency for minor releases is every 3-4 months. One > > change we'd like to make is to announce fixed release dates and merge > > windows for each release, to facilitate coordination. Each minor > > release will have a merge window where new patches can be merged, a QA > > window when only fixes can be merged, then a final period where voting > > occurs on release candidates. These windows will be announced > > immediately after the previous minor release to give people plenty of > > time, and over time, we might make the whole release process more > > regular (similar to Ubuntu). At the bottom of this document is an > > example window for the 1.0.0 release. > > > > Maintenance releases will occur more frequently and depend on specific > > patches introduced (e.g. bug fixes) and their urgency. In general > > these releases are designed to patch bugs. However, higher level > > libraries may introduce small features, such as a new algorithm, > > provided they are entirely additive and isolated from existing code > > paths. Spark core may not introduce any features. > > > > When new components are added to Spark, they may initially be marked > > as "alpha". Alpha components do not have to abide by the above > > guidelines, however, to the maximum extent possible, they should try > > to. Once they are marked "stable" they have to follow these > > guidelines. At present, GraphX is the only alpha component of Spark. > > > > [1] API compatibility: > > > > An API is any public class or interface exposed in Spark that is not > > marked as semi-private or experimental. Release A is API compatible > > with release B if code compiled against release A *compiles cleanly* > > against B. This does not guarantee that a compiled application that is > > linked against version A will link cleanly against version B without > > re-compiling. Link-level compatibility is something we'll try to > > guarantee that as well, and we might make it a requirement in the > > future, but challenges with things like Scala versions have made this > > difficult to guarantee in the past. > > > > == Merging Pull Requests == > > To merge pull requests, committers are encouraged to use this tool [2] > > to collapse the request into one commit rather than manually > > performing git merges. It will also format the commit message nicely > > in a way that can be easily parsed later when writing credits. > > Currently it is maintained in a public utility repository, but we'll > > merge it into mainline Spark soon. > > > > [2] > https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py > > > > == Tentative Release Window for 1.0.0 == > > Feb 1st - April 1st: General development > > April 1st: Code freeze for new features > > April 15th: RC1 > > > > == Deviations == > > For now, the proposal is to cons
Re: Proposal for Spark Release Strategy
+1 on time boxed releases and compatibility guidelines > Am 06.02.2014 um 01:20 schrieb Patrick Wendell : > > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of the development process. The next release > of Spark will likely be Spark 1.0.0, so this message is intended in > part to coordinate the release plan for 1.0.0 and future releases. > I'll post this on the wiki after discussing it on this thread as > tentative project guidelines. > > == Spark Release Structure == > Starting with Spark 1.0.0, the Spark project will follow the semantic > versioning guidelines (http://semver.org/) with a few deviations. > These small differences account for Spark's nature as a multi-module > project. > > Each Spark release will be versioned: > [MAJOR].[MINOR].[MAINTENANCE] > > All releases with the same major version number will have API > compatibility, defined as [1]. Major version numbers will remain > stable over long periods of time. For instance, 1.X.Y may last 1 year > or more. > > Minor releases will typically contain new features and improvements. > The target frequency for minor releases is every 3-4 months. One > change we'd like to make is to announce fixed release dates and merge > windows for each release, to facilitate coordination. Each minor > release will have a merge window where new patches can be merged, a QA > window when only fixes can be merged, then a final period where voting > occurs on release candidates. These windows will be announced > immediately after the previous minor release to give people plenty of > time, and over time, we might make the whole release process more > regular (similar to Ubuntu). At the bottom of this document is an > example window for the 1.0.0 release. > > Maintenance releases will occur more frequently and depend on specific > patches introduced (e.g. bug fixes) and their urgency. In general > these releases are designed to patch bugs. However, higher level > libraries may introduce small features, such as a new algorithm, > provided they are entirely additive and isolated from existing code > paths. Spark core may not introduce any features. > > When new components are added to Spark, they may initially be marked > as "alpha". Alpha components do not have to abide by the above > guidelines, however, to the maximum extent possible, they should try > to. Once they are marked "stable" they have to follow these > guidelines. At present, GraphX is the only alpha component of Spark. > > [1] API compatibility: > > An API is any public class or interface exposed in Spark that is not > marked as semi-private or experimental. Release A is API compatible > with release B if code compiled against release A *compiles cleanly* > against B. This does not guarantee that a compiled application that is > linked against version A will link cleanly against version B without > re-compiling. Link-level compatibility is something we'll try to > guarantee that as well, and we might make it a requirement in the > future, but challenges with things like Scala versions have made this > difficult to guarantee in the past. > > == Merging Pull Requests == > To merge pull requests, committers are encouraged to use this tool [2] > to collapse the request into one commit rather than manually > performing git merges. It will also format the commit message nicely > in a way that can be easily parsed later when writing credits. > Currently it is maintained in a public utility repository, but we'll > merge it into mainline Spark soon. > > [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py > > == Tentative Release Window for 1.0.0 == > Feb 1st - April 1st: General development > April 1st: Code freeze for new features > April 15th: RC1 > > == Deviations == > For now, the proposal is to consider these tentative guidelines. We > can vote to formalize these as project rules at a later time after > some experience working with them. Once formalized, any deviation to > these guidelines will be subject to a lazy majority vote. > > - Patrick
Re: Proposal for Spark Release Strategy
I would even take it further, when it comes to PR's: - any pr needs to reference a jira - the pr should be rebased before submitting, to avoid merge commits - as patrick said: require squashed commits /heiko > Am 06.02.2014 um 01:39 schrieb Mark Hamstra : > > I would strongly encourage that developers submitting pull requests include > within the description of that PR whether you intend the contribution to be > mergeable at the maintenance level, minor level, or major level.
Re: Proposal for Spark Release Strategy
Yup, the intended merge level is just a hint, the responsibility still lies with the committers. It can be a helpful hint, though. On Wed, Feb 5, 2014 at 4:55 PM, Patrick Wendell wrote: > > How are Alpha components and higher level libraries which may add small > > features within a maintenance release going to be marked with that > status? > > Somehow/somewhere within the code itself, as just as some kind of > external > > reference? > > I think we'd mark alpha features as such in the java/scaladoc. This is > what scala does with experimental features. Higher level libraries are > anything that isn't Spark core. Maybe we can formalize this more > somehow. > > We might be able to annotate the new features as experimental if they > end up in a patch release. This could make it more clear. > > > > > I would strongly encourage that developers submitting pull requests > include > > within the description of that PR whether you intend the contribution to > be > > mergeable at the maintenance level, minor level, or major level. That > will > > help those of us doing code reviews and merges decide where the code > should > > go and how closely to scrutinize the PR for changes that are not > compatible > > with the intended release level. > > I'd say the default is the minor level. If contributors know it should > be added in a maintenance release, it's great if they say so. However > I'd say this is also responsibility with the committers, since > individual contributors may not know. It will probably be a while > before major level patches are being merged :P >
Re: Proposal for Spark Release Strategy
> How are Alpha components and higher level libraries which may add small > features within a maintenance release going to be marked with that status? > Somehow/somewhere within the code itself, as just as some kind of external > reference? I think we'd mark alpha features as such in the java/scaladoc. This is what scala does with experimental features. Higher level libraries are anything that isn't Spark core. Maybe we can formalize this more somehow. We might be able to annotate the new features as experimental if they end up in a patch release. This could make it more clear. > > I would strongly encourage that developers submitting pull requests include > within the description of that PR whether you intend the contribution to be > mergeable at the maintenance level, minor level, or major level. That will > help those of us doing code reviews and merges decide where the code should > go and how closely to scrutinize the PR for changes that are not compatible > with the intended release level. I'd say the default is the minor level. If contributors know it should be added in a maintenance release, it's great if they say so. However I'd say this is also responsibility with the committers, since individual contributors may not know. It will probably be a while before major level patches are being merged :P
Re: Proposal for Spark Release Strategy
Looks good. One question and one comment: How are Alpha components and higher level libraries which may add small features within a maintenance release going to be marked with that status? Somehow/somewhere within the code itself, as just as some kind of external reference? I would strongly encourage that developers submitting pull requests include within the description of that PR whether you intend the contribution to be mergeable at the maintenance level, minor level, or major level. That will help those of us doing code reviews and merges decide where the code should go and how closely to scrutinize the PR for changes that are not compatible with the intended release level. On Wed, Feb 5, 2014 at 4:20 PM, Patrick Wendell wrote: > Hi Everyone, > > In an effort to coordinate development amongst the growing list of > Spark contributors, I've taken some time to write up a proposal to > formalize various pieces of the development process. The next release > of Spark will likely be Spark 1.0.0, so this message is intended in > part to coordinate the release plan for 1.0.0 and future releases. > I'll post this on the wiki after discussing it on this thread as > tentative project guidelines. > > == Spark Release Structure == > Starting with Spark 1.0.0, the Spark project will follow the semantic > versioning guidelines (http://semver.org/) with a few deviations. > These small differences account for Spark's nature as a multi-module > project. > > Each Spark release will be versioned: > [MAJOR].[MINOR].[MAINTENANCE] > > All releases with the same major version number will have API > compatibility, defined as [1]. Major version numbers will remain > stable over long periods of time. For instance, 1.X.Y may last 1 year > or more. > > Minor releases will typically contain new features and improvements. > The target frequency for minor releases is every 3-4 months. One > change we'd like to make is to announce fixed release dates and merge > windows for each release, to facilitate coordination. Each minor > release will have a merge window where new patches can be merged, a QA > window when only fixes can be merged, then a final period where voting > occurs on release candidates. These windows will be announced > immediately after the previous minor release to give people plenty of > time, and over time, we might make the whole release process more > regular (similar to Ubuntu). At the bottom of this document is an > example window for the 1.0.0 release. > > Maintenance releases will occur more frequently and depend on specific > patches introduced (e.g. bug fixes) and their urgency. In general > these releases are designed to patch bugs. However, higher level > libraries may introduce small features, such as a new algorithm, > provided they are entirely additive and isolated from existing code > paths. Spark core may not introduce any features. > > When new components are added to Spark, they may initially be marked > as "alpha". Alpha components do not have to abide by the above > guidelines, however, to the maximum extent possible, they should try > to. Once they are marked "stable" they have to follow these > guidelines. At present, GraphX is the only alpha component of Spark. > > [1] API compatibility: > > An API is any public class or interface exposed in Spark that is not > marked as semi-private or experimental. Release A is API compatible > with release B if code compiled against release A *compiles cleanly* > against B. This does not guarantee that a compiled application that is > linked against version A will link cleanly against version B without > re-compiling. Link-level compatibility is something we'll try to > guarantee that as well, and we might make it a requirement in the > future, but challenges with things like Scala versions have made this > difficult to guarantee in the past. > > == Merging Pull Requests == > To merge pull requests, committers are encouraged to use this tool [2] > to collapse the request into one commit rather than manually > performing git merges. It will also format the commit message nicely > in a way that can be easily parsed later when writing credits. > Currently it is maintained in a public utility repository, but we'll > merge it into mainline Spark soon. > > [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py > > == Tentative Release Window for 1.0.0 == > Feb 1st - April 1st: General development > April 1st: Code freeze for new features > April 15th: RC1 > > == Deviations == > For now, the proposal is to consider these tentative guidelines. We > can vote to formalize these as project rules at a later time after > some experience working with them. Once formalized, any deviation to > these guidelines will be subject to a lazy majority vote. > > - Patrick >
Proposal for Spark Release Strategy
Hi Everyone, In an effort to coordinate development amongst the growing list of Spark contributors, I've taken some time to write up a proposal to formalize various pieces of the development process. The next release of Spark will likely be Spark 1.0.0, so this message is intended in part to coordinate the release plan for 1.0.0 and future releases. I'll post this on the wiki after discussing it on this thread as tentative project guidelines. == Spark Release Structure == Starting with Spark 1.0.0, the Spark project will follow the semantic versioning guidelines (http://semver.org/) with a few deviations. These small differences account for Spark's nature as a multi-module project. Each Spark release will be versioned: [MAJOR].[MINOR].[MAINTENANCE] All releases with the same major version number will have API compatibility, defined as [1]. Major version numbers will remain stable over long periods of time. For instance, 1.X.Y may last 1 year or more. Minor releases will typically contain new features and improvements. The target frequency for minor releases is every 3-4 months. One change we'd like to make is to announce fixed release dates and merge windows for each release, to facilitate coordination. Each minor release will have a merge window where new patches can be merged, a QA window when only fixes can be merged, then a final period where voting occurs on release candidates. These windows will be announced immediately after the previous minor release to give people plenty of time, and over time, we might make the whole release process more regular (similar to Ubuntu). At the bottom of this document is an example window for the 1.0.0 release. Maintenance releases will occur more frequently and depend on specific patches introduced (e.g. bug fixes) and their urgency. In general these releases are designed to patch bugs. However, higher level libraries may introduce small features, such as a new algorithm, provided they are entirely additive and isolated from existing code paths. Spark core may not introduce any features. When new components are added to Spark, they may initially be marked as "alpha". Alpha components do not have to abide by the above guidelines, however, to the maximum extent possible, they should try to. Once they are marked "stable" they have to follow these guidelines. At present, GraphX is the only alpha component of Spark. [1] API compatibility: An API is any public class or interface exposed in Spark that is not marked as semi-private or experimental. Release A is API compatible with release B if code compiled against release A *compiles cleanly* against B. This does not guarantee that a compiled application that is linked against version A will link cleanly against version B without re-compiling. Link-level compatibility is something we'll try to guarantee that as well, and we might make it a requirement in the future, but challenges with things like Scala versions have made this difficult to guarantee in the past. == Merging Pull Requests == To merge pull requests, committers are encouraged to use this tool [2] to collapse the request into one commit rather than manually performing git merges. It will also format the commit message nicely in a way that can be easily parsed later when writing credits. Currently it is maintained in a public utility repository, but we'll merge it into mainline Spark soon. [2] https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py == Tentative Release Window for 1.0.0 == Feb 1st - April 1st: General development April 1st: Code freeze for new features April 15th: RC1 == Deviations == For now, the proposal is to consider these tentative guidelines. We can vote to formalize these as project rules at a later time after some experience working with them. Once formalized, any deviation to these guidelines will be subject to a lazy majority vote. - Patrick