Echoing Shivaram here. I don't think it makes a lot of sense to add more features to the 1.x line. We should still do critical bug fixes though.
On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > +1 > > On a related note I think making it lightweight will ensure that we > stay on the current release schedule and don't unnecessarily delay 2.0 > to wait for new features / big architectural changes. > > In terms of fixes to 1.x, I think our current policy of back-porting > fixes to older releases would still apply. I don't think developing > new features on both 1.x and 2.x makes a lot of sense as we would like > users to switch to 2.x. > > Shivaram > > On Tue, Nov 10, 2015 at 4:02 PM, Kostas Sakellis <kos...@cloudera.com> > wrote: > > +1 on a lightweight 2.0 > > > > What is the thinking around the 1.x line after Spark 2.0 is released? If > not > > terminated, how will we determine what goes into each major version line? > > Will 1.x only be for stability fixes? > > > > Thanks, > > Kostas > > > > On Tue, Nov 10, 2015 at 3:41 PM, Patrick Wendell <pwend...@gmail.com> > wrote: > >> > >> I also feel the same as Reynold. I agree we should minimize API breaks > and > >> focus on fixing things around the edge that were mistakes (e.g. exposing > >> Guava and Akka) rather than any overhaul that could fragment the > community. > >> Ideally a major release is a lightweight process we can do every couple > of > >> years, with minimal impact for users. > >> > >> - Patrick > >> > >> On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas > >> <nicholas.cham...@gmail.com> wrote: > >>> > >>> > For this reason, I would *not* propose doing major releases to break > >>> > substantial API's or perform large re-architecting that prevent > users from > >>> > upgrading. Spark has always had a culture of evolving architecture > >>> > incrementally and making changes - and I don't think we want to > change this > >>> > model. > >>> > >>> +1 for this. The Python community went through a lot of turmoil over > the > >>> Python 2 -> Python 3 transition because the upgrade process was too > painful > >>> for too long. The Spark community will benefit greatly from our > explicitly > >>> looking to avoid a similar situation. > >>> > >>> > 3. Assembly-free distribution of Spark: don’t require building an > >>> > enormous assembly jar in order to run Spark. > >>> > >>> Could you elaborate a bit on this? I'm not sure what an assembly-free > >>> distribution means. > >>> > >>> Nick > >>> > >>> On Tue, Nov 10, 2015 at 6:11 PM Reynold Xin <r...@databricks.com> > wrote: > >>>> > >>>> I’m starting a new thread since the other one got intermixed with > >>>> feature requests. Please refrain from making feature request in this > thread. > >>>> Not that we shouldn’t be adding features, but we can always add > features in > >>>> 1.7, 2.1, 2.2, ... > >>>> > >>>> First - I want to propose a premise for how to think about Spark 2.0 > and > >>>> major releases in Spark, based on discussion with several members of > the > >>>> community: a major release should be low overhead and minimally > disruptive > >>>> to the Spark community. A major release should not be very different > from a > >>>> minor release and should not be gated based on new features. The main > >>>> purpose of a major release is an opportunity to fix things that are > broken > >>>> in the current API and remove certain deprecated APIs (examples > follow). > >>>> > >>>> For this reason, I would *not* propose doing major releases to break > >>>> substantial API's or perform large re-architecting that prevent users > from > >>>> upgrading. Spark has always had a culture of evolving architecture > >>>> incrementally and making changes - and I don't think we want to > change this > >>>> model. In fact, we’ve released many architectural changes on the 1.X > line. > >>>> > >>>> If the community likes the above model, then to me it seems reasonable > >>>> to do Spark 2.0 either after Spark 1.6 (in lieu of Spark 1.7) or > immediately > >>>> after Spark 1.7. It will be 18 or 21 months since Spark 1.0. A > cadence of > >>>> major releases every 2 years seems doable within the above model. > >>>> > >>>> Under this model, here is a list of example things I would propose > doing > >>>> in Spark 2.0, separated into APIs and Operation/Deployment: > >>>> > >>>> > >>>> APIs > >>>> > >>>> 1. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in > >>>> Spark 1.x. > >>>> > >>>> 2. Remove Akka from Spark’s API dependency (in streaming), so user > >>>> applications can use Akka (SPARK-5293). We have gotten a lot of > complaints > >>>> about user applications being unable to use Akka due to Spark’s > dependency > >>>> on Akka. > >>>> > >>>> 3. Remove Guava from Spark’s public API (JavaRDD Optional). > >>>> > >>>> 4. Better class package structure for low level developer API’s. In > >>>> particular, we have some DeveloperApi (mostly various listener-related > >>>> classes) added over the years. Some packages include only one or two > public > >>>> classes but a lot of private classes. A better structure is to have > public > >>>> classes isolated to a few public packages, and these public packages > should > >>>> have minimal private classes for low level developer APIs. > >>>> > >>>> 5. Consolidate task metric and accumulator API. Although having some > >>>> subtle differences, these two are very similar but have completely > different > >>>> code path. > >>>> > >>>> 6. Possibly making Catalyst, Dataset, and DataFrame more general by > >>>> moving them to other package(s). They are already used beyond SQL, > e.g. in > >>>> ML pipelines, and will be used by streaming also. > >>>> > >>>> > >>>> Operation/Deployment > >>>> > >>>> 1. Scala 2.11 as the default build. We should still support Scala > 2.10, > >>>> but it has been end-of-life. > >>>> > >>>> 2. Remove Hadoop 1 support. > >>>> > >>>> 3. Assembly-free distribution of Spark: don’t require building an > >>>> enormous assembly jar in order to run Spark. > >>>> > >> > > >