Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread Sean Owen
These two are coupled, and in tension: don't want to take much change, but do want changes that will unfortunately be somewhat breaking. A 2.5 release with these items would be different enough as to strain the general level of compatibility implied by a minor release. Sure, it's not 'just' a

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread DB Tsai
For example, JDK11 requires dependency changes which can not go into 2.4.7. Recent development on Kube such as supporting dynamical allocation in Spark 3.0 in Kube (without shuffle service) will be hard to go in 2.4.7. Sent from my iPhone > On Jun 12, 2020, at 11:50 PM, Reynold Xin wrote: >

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread Reynold Xin
Echoing Sean's earlier comment … What is the functionality that would go into a 2.5.0 release, that can't be in a 2.4.7 release? On Fri, Jun 12, 2020 at 11:14 PM, Holden Karau < hol...@pigscanfly.ca > wrote: > > Can I suggest we maybe decouple this conversation a bit? First, if there > is an

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread Holden Karau
Can I suggest we maybe decouple this conversation a bit? First, if there is an agreement in making a transitional release in principle and then folks who feel strongly about specific backports can have their respective discussions.It's not like we normally know or have agreement on everything

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Reynold Xin
I understand the argument to add JDK 11 support just to extend the EOL, but the other things seem kind of arbitrary and are not supported by your arguments, especially DSv2 which is a massive change. DSv2 IIUC is not api stable yet and will continue to evolve in the 3.x line. Spark is designed in

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread DB Tsai
+1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support We had an internal preview version of Spark 3.0 for our customers to try out for a while, and then we realized that it's very challenging for enterprise applications in production to move to Spark 3.0. For example, many of our

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Jungtaek Lim
I guess we already went through the same discussion, right? If anyone is missed, please go through the discussion thread. [1] The consensus looks to be not positive to migrate the new DSv2 into Spark 2.x version line, because the change is pretty much huge, and also backward incompatible. What I

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Ryan Blue
+1 for a 2.x release with a DSv2 API that matches 3.0. There are a lot of big differences between the API in 2.4 and 3.0, and I think a release to help migrate would be beneficial to organizations like ours that will be supporting 2.x and 3.0 in parallel for quite a while. Migration to Spark 3 is

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Xiao Li
Based on my understanding, DSV2 is not stable yet. It still misses various features. Even our built-in file sources are still unable to fully migrate to DSV2. We plan to enhance it in the next few releases to close the gap. Also, the changes on DSV2 in Spark 3.0 did not break any existing

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Holden Karau
So I one of the things which we’re planning on backporting internally is DSv2, which I think being available in a community release in a 2 branch would be more broadly useful. Anything else on top of that would be on a case by case basis for if they make an easier upgrade path to 3. If we’re

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Sean Owen
What is the functionality that would go into a 2.5.0 release, that can't be in a 2.4.7 release? I think that's the key question. 2.4.x is the 2.x maintenance branch, and I personally could imagine being open to more freely backporting a few new features for 2.x users, whereas usually it's only bug

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Xiao Li
Which new functionalities are you referring to? In Spark SQL, most of the major features in Spark 3.0 are difficult/time-consuming to backport. For example, adaptive query execution. Releasing a new version is not hard, but backporting/reviewing/maintaining these features are very time-consuming.

Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Holden Karau
Hi Folks, As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5 release. Spark 3 brings a number of important changes, and by its nature is not backward compatible. I think we'd all like to have as smooth an upgrade experience to Spark 3 as possible, and I believe that having a Spark