My concern is that the v2 data source API is still evolving and not very close to stable. I had hoped to have stabilized the API and behaviors for a 3.0 release. But we could also wait on that for a 4.0 release, depending on when we think that will be.
Unless there is a pressing need to move to 3.0 for some other area, I think it would be better for the v2 sources to have a 2.5 release. On Thu, Sep 6, 2018 at 8:59 AM Xiao Li <gatorsm...@gmail.com> wrote: > Yesterday, the 2.4 branch was created. Based on the above discussion, I > think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern? > > Thanks, > > Xiao > > vaquar khan <vaquar.k...@gmail.com> 于2018年6月16日周六 上午10:21写道: > >> +1 for 2.4 next, followed by 3.0. >> >> Where we can get Apache Spark road map for 2.4 and 2.5 .... 3.0 ? >> is it possible we can share future release proposed specification same >> like releases ( >> https://spark.apache.org/releases/spark-release-2-3-0.html) >> Regards, >> Viquar khan >> >> On Sat, Jun 16, 2018 at 12:02 PM, vaquar khan <vaquar.k...@gmail.com> >> wrote: >> >>> Plz ignore last email link (you tube )not sure how it added . >>> Apologies not sure how to delete it. >>> >>> >>> On Sat, Jun 16, 2018 at 11:58 AM, vaquar khan <vaquar.k...@gmail.com> >>> wrote: >>> >>>> +1 >>>> >>>> https://www.youtube.com/watch?v=-ik7aJ5U6kg >>>> >>>> Regards, >>>> Vaquar khan >>>> >>>> On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>>> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. >>>>> >>>>> >>>>> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >>>>>> I agree, I dont see pressing need for major version bump as well. >>>>>> >>>>>> >>>>>> Regards, >>>>>> Mridul >>>>>> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra < >>>>>> m...@clearstorydata.com> wrote: >>>>>> > >>>>>> > Changing major version numbers is not about new features or a vague >>>>>> notion that it is time to do something that will be seen to be a >>>>>> significant release. It is about breaking stable public APIs. >>>>>> > >>>>>> > I still remain unconvinced that the next version can't be 2.4.0. >>>>>> > >>>>>> > On Fri, Jun 15, 2018 at 1:34 AM Andy <andyye...@gmail.com> wrote: >>>>>> >> >>>>>> >> Dear all: >>>>>> >> >>>>>> >> It have been 2 months since this topic being proposed. Any >>>>>> progress now? 2018 has been passed about 1/2. >>>>>> >> >>>>>> >> I agree with that the new version should be some exciting new >>>>>> feature. How about this one: >>>>>> >> >>>>>> >> 6. ML/DL framework to be integrated as core component and feature. >>>>>> (Such as Angel / BigDL / ……) >>>>>> >> >>>>>> >> 3.0 is a very important version for an good open source project. >>>>>> It should be better to drift away the historical burden and focus in new >>>>>> area. Spark has been widely used all over the world as a successful big >>>>>> data framework. And it can be better than that. >>>>>> >> >>>>>> >> Andy >>>>>> >> >>>>>> >> >>>>>> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin <r...@databricks.com> >>>>>> wrote: >>>>>> >>> >>>>>> >>> There was a discussion thread on scala-contributors about Apache >>>>>> Spark not yet supporting Scala 2.12, and that got me to think perhaps it >>>>>> is >>>>>> about time for Spark to work towards the 3.0 release. By the time it >>>>>> comes >>>>>> out, it will be more than 2 years since Spark 2.0. >>>>>> >>> >>>>>> >>> For contributors less familiar with Spark’s history, I want to >>>>>> give more context on Spark releases: >>>>>> >>> >>>>>> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July >>>>>> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on >>>>>> Spark 3.0 in 2018. >>>>>> >>> >>>>>> >>> 2. Spark’s versioning policy promises that Spark does not break >>>>>> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are >>>>>> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 >>>>>> to >>>>>> 2.0, 2.x to 3.0). >>>>>> >>> >>>>>> >>> 3. That said, a major version isn’t necessarily the playground >>>>>> for disruptive API changes to make it painful for users to update. The >>>>>> main >>>>>> purpose of a major release is an opportunity to fix things that are >>>>>> broken >>>>>> in the current API and remove certain deprecated APIs. >>>>>> >>> >>>>>> >>> 4. Spark as a project has a culture of evolving architecture and >>>>>> developing major new features incrementally, so major releases are not >>>>>> the >>>>>> only time for exciting new features. For example, the bulk of the work in >>>>>> the move towards the DataFrame API was done in Spark 1.3, and Continuous >>>>>> Processing was introduced in Spark 2.3. Both were feature releases rather >>>>>> than major releases. >>>>>> >>> >>>>>> >>> >>>>>> >>> You can find more background in the thread discussing Spark 2.0: >>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-Spark-2-0-td15122.html >>>>>> >>> >>>>>> >>> >>>>>> >>> The primary motivating factor IMO for a major version bump is to >>>>>> support Scala 2.12, which requires minor API breaking changes to Spark’s >>>>>> APIs. Similar to Spark 2.0, I think there are also opportunities for >>>>>> other >>>>>> changes that we know have been biting us for a long time but can’t be >>>>>> changed in feature releases (to be clear, I’m actually not sure they are >>>>>> all good ideas, but I’m writing them down as candidates for >>>>>> consideration): >>>>>> >>> >>>>>> >>> 1. Support Scala 2.12. >>>>>> >>> >>>>>> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) >>>>>> deprecated in Spark 2.x. >>>>>> >>> >>>>>> >>> 3. Shade all dependencies. >>>>>> >>> >>>>>> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL >>>>>> compliant, to prevent users from shooting themselves in the foot, e.g. >>>>>> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it >>>>>> less painful for users to upgrade here, I’d suggest creating a flag for >>>>>> backward compatibility mode. >>>>>> >>> >>>>>> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL >>>>>> more standard compliant, and have a flag for backward compatibility. >>>>>> >>> >>>>>> >>> 6. Miscellaneous other small changes documented in JIRA already >>>>>> (e.g. “JavaPairRDD flatMapValues requires function returning Iterable, >>>>>> not >>>>>> Iterator”, “Prevent column name duplication in temporary view”). >>>>>> >>> >>>>>> >>> >>>>>> >>> Now the reality of a major version bump is that the world often >>>>>> thinks in terms of what exciting features are coming. I do think there >>>>>> are >>>>>> a number of major changes happening already that can be part of the 3.0 >>>>>> release, if they make it in: >>>>>> >>> >>>>>> >>> 1. Scala 2.12 support (listing it twice) >>>>>> >>> 2. Continuous Processing non-experimental >>>>>> >>> 3. Kubernetes support non-experimental >>>>>> >>> 4. A more flushed out version of data source API v2 (I don’t >>>>>> think it is realistic to stabilize that in one release) >>>>>> >>> 5. Hadoop 3.0 support >>>>>> >>> 6. ... >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> Similar to the 2.0 discussion, this thread should focus on the >>>>>> framework and whether it’d make sense to create Spark 3.0 as the next >>>>>> release, rather than the individual feature requests. Those are important >>>>>> but are best done in their own separate threads. >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Vaquar Khan >>>> +1 -224-436-0783 >>>> Greater Chicago >>>> >>> >>> >>> >>> -- >>> Regards, >>> Vaquar Khan >>> +1 -224-436-0783 >>> Greater Chicago >>> >> >> >> >> -- >> Regards, >> Vaquar Khan >> +1 -224-436-0783 >> Greater Chicago >> > -- Ryan Blue Software Engineer Netflix