Re: time for Apache Spark 3.0?

Reynold Xin Mon, 12 Nov 2018 13:57:39 -0800

Master branch now tracks 3.0.0-SHAPSHOT version, so the next one will be
3.0. In terms of time lining, unless we change anything specifically, Spark
feature releases are on a 6-mo cadence. Spark 2.4 was just released last
week, so 3.0 will be roughly 6 month from now.


On Mon, Nov 12, 2018 at 1:54 PM Vinoo Ganesh <[email protected]> wrote:

> Quickly following up on this – is there a target date for when Spark 3.0
> may be released and/or a list of the likely api breaks that are
> anticipated?
>
>
>
> *From: *Xiao Li <[email protected]>
> *Date: *Saturday, September 29, 2018 at 02:09
> *To: *Reynold Xin <[email protected]>
> *Cc: *Matei Zaharia <[email protected]>, Ryan Blue <
> [email protected]>, Mark Hamstra <[email protected]>, "
> [email protected]" <[email protected]>
> *Subject: *Re: time for Apache Spark 3.0?
>
>
>
> Yes. We should create a SPIP for each major breaking change.
>
>
>
> Reynold Xin <[email protected]> 于2018年9月28日周五 下午11:05写道：
>
> i think we should create spips for some of them, since they are pretty
> large ... i can create some tickets to start with
>
>
> --
>
> excuse the brevity and lower case due to wrist injury
>
>
>
>
>
> On Fri, Sep 28, 2018 at 11:01 PM Xiao Li <[email protected]> wrote:
>
> Based on the above discussions, we have a "rough consensus" that the next
> release will be 3.0. Now, we can start working on the API breaking changes
> (e.g., the ones mentioned in the original email from Reynold).
>
>
>
> Cheers,
>
>
>
> Xiao
>
>
>
> Matei Zaharia <[email protected]> 于2018年9月6日周四 下午2:21写道：
>
> Yes, you can start with Unstable and move to Evolving and Stable when
> needed. We’ve definitely had experimental features that changed across
> maintenance releases when they were well-isolated. If your change risks
> breaking stuff in stable components of Spark though, then it probably won’t
> be suitable for that.
>
> > On Sep 6, 2018, at 1:49 PM, Ryan Blue <[email protected]> wrote:
> >
> > I meant flexibility beyond the point releases. I think what Reynold was
> suggesting was getting v2 code out more often than the point releases every
> 6 months. An Evolving API can change in point releases, but maybe we should
> move v2 to Unstable so it can change more often? I don't really see another
> way to get changes out more often.
> >
> > On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra <[email protected]>
> wrote:
> > Yes, that is why we have these annotations in the code and the
> corresponding labels appearing in the API documentation: 
> https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_blob_master_common_tags_src_main_java_org_apache_spark_annotation_InterfaceStability.java&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=7WzLIMu3WvZwd6AMPatqn1KZW39eI6c_oflAHIy1NUc&m=XgVDeB7pewN3jZ6po86BzIEmn1mgLmYtNGgcLZMQRjY&s=VSHC6Lqh_ewbLsLD69bdkRpXSeiR63uu3wOcHeJizbc&e=>
> >
> > As long as it is properly annotated, we can change or even eliminate an
> API method before the next major release. And frankly, we shouldn't be
> contemplating bringing in the DS v2 API (and, I'd argue, any new API)
> without such an annotation. There is just too much risk of not getting
> everything right before we see the results of the new API being more widely
> used, and too much cost in maintaining until the next major release
> something that we come to regret for us to create new API in a fully frozen
> state.
> >
> >
> > On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue <[email protected]>
> wrote:
> > It would be great to get more features out incrementally. For
> experimental features, do we have more relaxed constraints?
> >
> > On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin <[email protected]> wrote:
> > +1 on 3.0
> >
> > Dsv2 stable can still evolve in across major releases. DataFrame,
> Dataset, dsv1 and a lot of other major features all were developed
> throughout the 1.x and 2.x lines.
> >
> > I do want to explore ways for us to get dsv2 incremental changes out
> there more frequently, to get feedback. Maybe that means we apply additive
> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
> will start a separate thread about it.
> >
> >
> >
> > On Thu, Sep 6, 2018 at 9:31 AM Sean Owen <[email protected]> wrote:
> > I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
> timing? 6 months?) but simply next. Do you mean you'd prefer that change to
> happen before 3.x? if it's a significant change, seems reasonable for a
> major version bump rather than minor. Is the concern that tying it to 3.0
> means you have to take a major version update to get it?
> >
> > I generally support moving on to 3.x so we can also jettison a lot of
> older dependencies, code, fix some long standing issues, etc.
> >
> > (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
> >
> > On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue <[email protected]>
> wrote:
> > My concern is that the v2 data source API is still evolving and not very
> close to stable. I had hoped to have stabilized the API and behaviors for a
> 3.0 release. But we could also wait on that for a 4.0 release, depending on
> when we think that will be.
> >
> > Unless there is a pressing need to move to 3.0 for some other area, I
> think it would be better for the v2 sources to have a 2.5 release.
> >
> > On Thu, Sep 6, 2018 at 8:59 AM Xiao Li <[email protected]> wrote:
> > Yesterday, the 2.4 branch was created. Based on the above discussion, I
> think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: time for Apache Spark 3.0?

Reply via email to