Agree. If it is deprecated, get rid of it in 2.0 If the deprecation was a mistake, let's fix that.
Suds Sent from my iPhone On Nov 10, 2015, at 5:04 PM, Reynold Xin <r...@databricks.com> wrote: Maybe a better idea is to un-deprecate an API if it is too important to not be removed. I don't think we can drop Java 7 support. It's way too soon. On Tue, Nov 10, 2015 at 4:59 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Really, Sandy? "Extra consideration" even for already-deprecated API? If > we're not going to remove these with a major version change, then just when > will we remove them? > > On Tue, Nov 10, 2015 at 4:53 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> Another +1 to Reynold's proposal. >> >> Maybe this is obvious, but I'd like to advocate against a blanket removal >> of deprecated / developer APIs. Many APIs can likely be removed without >> material impact (e.g. the SparkContext constructor that takes preferred >> node location data), while others likely see heavier usage (e.g. I wouldn't >> be surprised if mapPartitionsWithContext was baked into a number of apps) >> and merit a little extra consideration. >> >> Maybe also obvious, but I think a migration guide with API equivlents and >> the like would be incredibly useful in easing the transition. >> >> -Sandy >> >> On Tue, Nov 10, 2015 at 4:28 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> Echoing Shivaram here. I don't think it makes a lot of sense to add more >>> features to the 1.x line. We should still do critical bug fixes though. >>> >>> >>> On Tue, Nov 10, 2015 at 4:23 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>>> +1 >>>> >>>> On a related note I think making it lightweight will ensure that we >>>> stay on the current release schedule and don't unnecessarily delay 2.0 >>>> to wait for new features / big architectural changes. >>>> >>>> In terms of fixes to 1.x, I think our current policy of back-porting >>>> fixes to older releases would still apply. I don't think developing >>>> new features on both 1.x and 2.x makes a lot of sense as we would like >>>> users to switch to 2.x. >>>> >>>> Shivaram >>>> >>>> On Tue, Nov 10, 2015 at 4:02 PM, Kostas Sakellis <kos...@cloudera.com> >>>> wrote: >>>> > +1 on a lightweight 2.0 >>>> > >>>> > What is the thinking around the 1.x line after Spark 2.0 is released? >>>> If not >>>> > terminated, how will we determine what goes into each major version >>>> line? >>>> > Will 1.x only be for stability fixes? >>>> > >>>> > Thanks, >>>> > Kostas >>>> > >>>> > On Tue, Nov 10, 2015 at 3:41 PM, Patrick Wendell <pwend...@gmail.com> >>>> wrote: >>>> >> >>>> >> I also feel the same as Reynold. I agree we should minimize API >>>> breaks and >>>> >> focus on fixing things around the edge that were mistakes (e.g. >>>> exposing >>>> >> Guava and Akka) rather than any overhaul that could fragment the >>>> community. >>>> >> Ideally a major release is a lightweight process we can do every >>>> couple of >>>> >> years, with minimal impact for users. >>>> >> >>>> >> - Patrick >>>> >> >>>> >> On Tue, Nov 10, 2015 at 3:35 PM, Nicholas Chammas >>>> >> <nicholas.cham...@gmail.com> wrote: >>>> >>> >>>> >>> > For this reason, I would *not* propose doing major releases to >>>> break >>>> >>> > substantial API's or perform large re-architecting that prevent >>>> users from >>>> >>> > upgrading. Spark has always had a culture of evolving architecture >>>> >>> > incrementally and making changes - and I don't think we want to >>>> change this >>>> >>> > model. >>>> >>> >>>> >>> +1 for this. The Python community went through a lot of turmoil >>>> over the >>>> >>> Python 2 -> Python 3 transition because the upgrade process was too >>>> painful >>>> >>> for too long. The Spark community will benefit greatly from our >>>> explicitly >>>> >>> looking to avoid a similar situation. >>>> >>> >>>> >>> > 3. Assembly-free distribution of Spark: don’t require building an >>>> >>> > enormous assembly jar in order to run Spark. >>>> >>> >>>> >>> Could you elaborate a bit on this? I'm not sure what an >>>> assembly-free >>>> >>> distribution means. >>>> >>> >>>> >>> Nick >>>> >>> >>>> >>> On Tue, Nov 10, 2015 at 6:11 PM Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>> >>>> >>>> I’m starting a new thread since the other one got intermixed with >>>> >>>> feature requests. Please refrain from making feature request in >>>> this thread. >>>> >>>> Not that we shouldn’t be adding features, but we can always add >>>> features in >>>> >>>> 1.7, 2.1, 2.2, ... >>>> >>>> >>>> >>>> First - I want to propose a premise for how to think about Spark >>>> 2.0 and >>>> >>>> major releases in Spark, based on discussion with several members >>>> of the >>>> >>>> community: a major release should be low overhead and minimally >>>> disruptive >>>> >>>> to the Spark community. A major release should not be very >>>> different from a >>>> >>>> minor release and should not be gated based on new features. The >>>> main >>>> >>>> purpose of a major release is an opportunity to fix things that >>>> are broken >>>> >>>> in the current API and remove certain deprecated APIs (examples >>>> follow). >>>> >>>> >>>> >>>> For this reason, I would *not* propose doing major releases to >>>> break >>>> >>>> substantial API's or perform large re-architecting that prevent >>>> users from >>>> >>>> upgrading. Spark has always had a culture of evolving architecture >>>> >>>> incrementally and making changes - and I don't think we want to >>>> change this >>>> >>>> model. In fact, we’ve released many architectural changes on the >>>> 1.X line. >>>> >>>> >>>> >>>> If the community likes the above model, then to me it seems >>>> reasonable >>>> >>>> to do Spark 2.0 either after Spark 1.6 (in lieu of Spark 1.7) or >>>> immediately >>>> >>>> after Spark 1.7. It will be 18 or 21 months since Spark 1.0. A >>>> cadence of >>>> >>>> major releases every 2 years seems doable within the above model. >>>> >>>> >>>> >>>> Under this model, here is a list of example things I would propose >>>> doing >>>> >>>> in Spark 2.0, separated into APIs and Operation/Deployment: >>>> >>>> >>>> >>>> >>>> >>>> APIs >>>> >>>> >>>> >>>> 1. Remove interfaces, configs, and modules (e.g. Bagel) deprecated >>>> in >>>> >>>> Spark 1.x. >>>> >>>> >>>> >>>> 2. Remove Akka from Spark’s API dependency (in streaming), so user >>>> >>>> applications can use Akka (SPARK-5293). We have gotten a lot of >>>> complaints >>>> >>>> about user applications being unable to use Akka due to Spark’s >>>> dependency >>>> >>>> on Akka. >>>> >>>> >>>> >>>> 3. Remove Guava from Spark’s public API (JavaRDD Optional). >>>> >>>> >>>> >>>> 4. Better class package structure for low level developer API’s. In >>>> >>>> particular, we have some DeveloperApi (mostly various >>>> listener-related >>>> >>>> classes) added over the years. Some packages include only one or >>>> two public >>>> >>>> classes but a lot of private classes. A better structure is to >>>> have public >>>> >>>> classes isolated to a few public packages, and these public >>>> packages should >>>> >>>> have minimal private classes for low level developer APIs. >>>> >>>> >>>> >>>> 5. Consolidate task metric and accumulator API. Although having >>>> some >>>> >>>> subtle differences, these two are very similar but have completely >>>> different >>>> >>>> code path. >>>> >>>> >>>> >>>> 6. Possibly making Catalyst, Dataset, and DataFrame more general by >>>> >>>> moving them to other package(s). They are already used beyond SQL, >>>> e.g. in >>>> >>>> ML pipelines, and will be used by streaming also. >>>> >>>> >>>> >>>> >>>> >>>> Operation/Deployment >>>> >>>> >>>> >>>> 1. Scala 2.11 as the default build. We should still support Scala >>>> 2.10, >>>> >>>> but it has been end-of-life. >>>> >>>> >>>> >>>> 2. Remove Hadoop 1 support. >>>> >>>> >>>> >>>> 3. Assembly-free distribution of Spark: don’t require building an >>>> >>>> enormous assembly jar in order to run Spark. >>>> >>>> >>>> >> >>>> > >>>> >>> >>> >> >