Re: time for Apache Spark 3.0?

2018-11-13 Thread Matt Cheah
: Sean Owen , Vinoo Ganesh , dev Subject: Re: time for Apache Spark 3.0? As far as I know any JIRA that has implications for users is tagged this way but I haven't examined all of them. All that are going in for 3.0 should have it as Fix Version . Most changes won't have a user visible impact

Re: time for Apache Spark 3.0?

2018-11-13 Thread Sean Owen
As far as I know any JIRA that has implications for users is tagged this way but I haven't examined all of them. All that are going in for 3.0 should have it as Fix Version . Most changes won't have a user visible impact. Do you see any that seem to need the tag? Call em out or even fix them by

Re: time for Apache Spark 3.0?

2018-11-13 Thread Matt Cheah
The release-notes label on JIRA sounds good. Can we make it a point to have that done retroactively now, and then moving forward? On 11/12/18, 4:01 PM, "Sean Owen" wrote: My non-definitive takes -- I would personally like to remove all deprecated methods for Spark 3. I

Re: time for Apache Spark 3.0?

2018-11-12 Thread Sean Owen
My non-definitive takes -- I would personally like to remove all deprecated methods for Spark 3. I started by removing 'old' deprecated methods in that commit. Things deprecated in 2.4 are maybe less clear, whether they should be removed Everything's fair game for removal or change in a major

Re: time for Apache Spark 3.0?

2018-11-12 Thread Reynold Xin
ao Li , Matei Zaharia < > matei.zaha...@gmail.com>, Ryan Blue , Mark Hamstra < > m...@clearstorydata.com>, dev > *Subject: *Re: time for Apache Spark 3.0? > > > > Makes sense, thanks Reynold. > > > > *From: *Reynold Xin > *Date: *Monday, November 12

Re: time for Apache Spark 3.0?

2018-11-12 Thread Matt Cheah
Subject: Re: time for Apache Spark 3.0? Makes sense, thanks Reynold. From: Reynold Xin Date: Monday, November 12, 2018 at 16:57 To: Vinoo Ganesh Cc: Xiao Li , Matei Zaharia , Ryan Blue , Mark Hamstra , dev Subject: Re: time for Apache Spark 3.0? Master branch now tracks 3.0.0

Re: time for Apache Spark 3.0?

2018-11-12 Thread Vinoo Ganesh
Makes sense, thanks Reynold. From: Reynold Xin Date: Monday, November 12, 2018 at 16:57 To: Vinoo Ganesh Cc: Xiao Li , Matei Zaharia , Ryan Blue , Mark Hamstra , dev Subject: Re: time for Apache Spark 3.0? Master branch now tracks 3.0.0-SHAPSHOT version, so the next one will be 3.0

Re: time for Apache Spark 3.0?

2018-11-12 Thread Reynold Xin
Xin > *Cc: *Matei Zaharia , Ryan Blue < > rb...@netflix.com>, Mark Hamstra , " > u...@spark.apache.org" > *Subject: *Re: time for Apache Spark 3.0? > > > > Yes. We should create a SPIP for each major breaking change. > > > > Reynold Xin 于201

Re: time for Apache Spark 3.0?

2018-11-12 Thread Vinoo Ganesh
e.org" Subject: Re: time for Apache Spark 3.0? Yes. We should create a SPIP for each major breaking change. Reynold Xin mailto:r...@databricks.com>> 于2018年9月28日周五 下午11:05写道: i think we should create spips for some of them, since they are pretty large ... i can create some

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Yes. We should create a SPIP for each major breaking change. Reynold Xin 于2018年9月28日周五 下午11:05写道: > i think we should create spips for some of them, since they are pretty > large ... i can create some tickets to start with > > -- > excuse the brevity and lower case due to wrist injury > > > On

Re: time for Apache Spark 3.0?

2018-09-29 Thread Reynold Xin
i think we should create spips for some of them, since they are pretty large ... i can create some tickets to start with -- excuse the brevity and lower case due to wrist injury On Fri, Sep 28, 2018 at 11:01 PM Xiao Li wrote: > Based on the above discussions, we have a "rough consensus" that

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Based on the above discussions, we have a "rough consensus" that the next release will be 3.0. Now, we can start working on the API breaking changes (e.g., the ones mentioned in the original email from Reynold). Cheers, Xiao Matei Zaharia 于2018年9月6日周四 下午2:21写道: > Yes, you can start with

Re: time for Apache Spark 3.0?

2018-09-06 Thread Matei Zaharia
Yes, you can start with Unstable and move to Evolving and Stable when needed. We’ve definitely had experimental features that changed across maintenance releases when they were well-isolated. If your change risks breaking stuff in stable components of Spark though, then it probably won’t be

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
I meant flexibility beyond the point releases. I think what Reynold was suggesting was getting v2 code out more often than the point releases every 6 months. An Evolving API can change in point releases, but maybe we should move v2 to Unstable so it can change more often? I don't really see

Re: time for Apache Spark 3.0?

2018-09-06 Thread Mark Hamstra
Yes, that is why we have these annotations in the code and the corresponding labels appearing in the API documentation: https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java As long as it is properly annotated, we can change or

Re: time for Apache Spark 3.0?

2018-09-06 Thread sadhen
...@gmail.com 收件人:vaquar khanvaquar.k...@gmail.com 抄送:Reynold xinr...@databricks.com; Mridul muralidharanmri...@gmail.com; Mark hamstram...@clearstorydata.com; 银狐andyye...@gmail.com; user@spark.apache.org...@spark.apache.org 发送时间:2018年9月6日(周四) 23:59 主题:Re: time for Apache Spark 3.0? Yesterday, the 2.4

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
It would be great to get more features out incrementally. For experimental features, do we have more relaxed constraints? On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin wrote: > +1 on 3.0 > > Dsv2 stable can still evolve in across major releases. DataFrame, Dataset, > dsv1 and a lot of other major

Re: time for Apache Spark 3.0?

2018-09-06 Thread Reynold Xin
I definitely agree we shouldn't make dsv2 stable in the next release. On Thu, Sep 6, 2018 at 9:48 AM Ryan Blue wrote: > I definitely support moving to 3.0 to remove deprecations and update > dependencies. > > For the v2 work, we know that there will be a major API changes and > standardization

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
I definitely support moving to 3.0 to remove deprecations and update dependencies. For the v2 work, we know that there will be a major API changes and standardization of behavior from the new logical plans going into the next release. I think it is a safe bet that this isn’t going to be

Re: time for Apache Spark 3.0?

2018-09-06 Thread Reynold Xin
+1 on 3.0 Dsv2 stable can still evolve in across major releases. DataFrame, Dataset, dsv1 and a lot of other major features all were developed throughout the 1.x and 2.x lines. I do want to explore ways for us to get dsv2 incremental changes out there more frequently, to get feedback. Maybe that

Re: time for Apache Spark 3.0?

2018-09-06 Thread Sean Owen
I think this doesn't necessarily mean 3.0 is coming soon (thoughts on timing? 6 months?) but simply next. Do you mean you'd prefer that change to happen before 3.x? if it's a significant change, seems reasonable for a major version bump rather than minor. Is the concern that tying it to 3.0 means

Re: time for Apache Spark 3.0?

2018-09-06 Thread Ryan Blue
My concern is that the v2 data source API is still evolving and not very close to stable. I had hoped to have stabilized the API and behaviors for a 3.0 release. But we could also wait on that for a 4.0 release, depending on when we think that will be. Unless there is a pressing need to move to

Re: time for Apache Spark 3.0?

2018-09-06 Thread Xiao Li
Yesterday, the 2.4 branch was created. Based on the above discussion, I think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern? Thanks, Xiao vaquar khan 于2018年6月16日周六 上午10:21写道: > +1 for 2.4 next, followed by 3.0. > > Where we can get Apache Spark road map for 2.4 and 2.5

Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
+1 for 2.4 next, followed by 3.0. Where we can get Apache Spark road map for 2.4 and 2.5 3.0 ? is it possible we can share future release proposed specification same like releases (https://spark.apache.org/releases/spark-release-2-3-0.html) Regards, Viquar khan On Sat, Jun 16, 2018 at

Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
Plz ignore last email link (you tube )not sure how it added . Apologies not sure how to delete it. On Sat, Jun 16, 2018 at 11:58 AM, vaquar khan wrote: > +1 > > https://www.youtube.com/watch?v=-ik7aJ5U6kg > > Regards, > Vaquar khan > > On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin wrote: > >>

Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
+1 https://www.youtube.com/watch?v=-ik7aJ5U6kg Regards, Vaquar khan On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin wrote: > Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. > > > On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan > wrote: > >> I agree, I dont see

Re: time for Apache Spark 3.0?

2018-06-16 Thread Xiao Li
+1 2018-06-15 14:55 GMT-07:00 Reynold Xin : > Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. > > > On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan > wrote: > >> I agree, I dont see pressing need for major version bump as well. >> >> >> Regards, >> Mridul >> On Fri,

Re: time for Apache Spark 3.0?

2018-06-15 Thread Reynold Xin
Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan wrote: > I agree, I dont see pressing need for major version bump as well. > > > Regards, > Mridul > On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra > wrote: > > > >

Re: time for Apache Spark 3.0?

2018-06-15 Thread Mridul Muralidharan
I agree, I dont see pressing need for major version bump as well. Regards, Mridul On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra wrote: > > Changing major version numbers is not about new features or a vague notion > that it is time to do something that will be seen to be a significant >

Re: time for Apache Spark 3.0?

2018-06-15 Thread Mark Hamstra
Changing major version numbers is not about new features or a vague notion that it is time to do something that will be seen to be a significant release. It is about breaking stable public APIs. I still remain unconvinced that the next version can't be 2.4.0. On Fri, Jun 15, 2018 at 1:34 AM Andy

Re: time for Apache Spark 3.0?

2018-06-15 Thread Andy
*Dear all:* It have been 2 months since this topic being proposed. Any progress now? 2018 has been passed about 1/2. I agree with that the new version should be some exciting new feature. How about this one: *6. ML/DL framework to be integrated as core component and feature. (Such as Angel /

Re: time for Apache Spark 3.0?

2018-04-19 Thread Sean Owen
That certainly sounds beneficial, to maybe several other projects. If there's no downside and it takes away API issues, seems like a win. On Thu, Apr 19, 2018 at 5:28 AM Dean Wampler wrote: > I spoke with Martin Odersky and Lightbend's Scala Team about the known API >

Re: time for Apache Spark 3.0?

2018-04-19 Thread Dean Wampler
I spoke with Martin Odersky and Lightbend's Scala Team about the known API issue with method disambiguation. They offered to implement a small patch in a new release of Scala 2.12 to handle the issue without requiring a Spark API change. They would cut a 2.12.6 release for it. I'm told that Scala

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
On Thu, Apr 5, 2018 at 10:30 AM, Matei Zaharia wrote: > Sorry, but just to be clear here, this is the 2.12 API issue: > https://issues.apache.org/jira/browse/SPARK-14643, with more details in this > doc: >

Re: time for Apache Spark 3.0?

2018-04-05 Thread Steve Loughran
On 5 Apr 2018, at 18:04, Matei Zaharia > wrote: Java 9/10 support would be great to add as well. Be aware that the work moving hadoop core to java 9+ is still a big piece of work being undertaken by Akira Ajisaka & colleagues at NTT

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
Oh, forgot to add, but splitting the source tree in Scala also creates the issue of a big maintenance burden for third-party libraries built on Spark. As Josh said on the JIRA: "I think this is primarily going to be an issue for end users who want to use an existing source tree to

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
Sorry, but just to be clear here, this is the 2.12 API issue: https://issues.apache.org/jira/browse/SPARK-14643, with more details in this doc: https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit. Basically, if we are allowed to change Spark’s API a little to

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marcelo Vanzin
I remember seeing somewhere that Scala still has some issues with Java 9/10 so that might be hard... But on that topic, it might be better to shoot for Java 11 compatibility. 9 and 10, following the new release model, aren't really meant to be long-term releases. In general, agree with Sean

Re: time for Apache Spark 3.0?

2018-04-05 Thread Matei Zaharia
Java 9/10 support would be great to add as well. Regarding Scala 2.12, I thought that supporting it would become easier if we change the Spark API and ABI slightly. Basically, it is of course possible to create an alternate source tree today, but it might be possible to share the same source

Re: time for Apache Spark 3.0?

2018-04-05 Thread Marco Gaido
Hi all, I also agree with Mark that we should add Java 9/10 support to an eventual Spark 3.0 release, because supporting Java 9 is not a trivial task since we are using some internal APIs for the memory management which changed: either we find a solution which works on both (but I am not sure it

Re: time for Apache Spark 3.0?

2018-04-05 Thread Mark Hamstra
As with Sean, I'm not sure that this will require a new major version, but we should also be looking at Java 9 & 10 support -- particularly with regard to their better functionality in a containerized environment (memory limits from cgroups, not sysconf; support for cpusets). In that regard, we

Re: time for Apache Spark 3.0?

2018-04-05 Thread Sean Owen
On Wed, Apr 4, 2018 at 6:20 PM Reynold Xin wrote: > The primary motivating factor IMO for a major version bump is to support > Scala 2.12, which requires minor API breaking changes to Spark’s APIs. > Similar to Spark 2.0, I think there are also opportunities for other >

time for Apache Spark 3.0?

2018-04-04 Thread Reynold Xin
There was a discussion thread on scala-contributors about Apache Spark not yet supporting Scala 2.12, and that got me to think perhaps it is about time for Spark to work towards the 3.0 release. By the