Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Reynold Xin
I understand the argument to add JDK 11 support just to extend the EOL, but the other things seem kind of arbitrary and are not supported by your arguments, especially DSv2 which is a massive change. DSv2 IIUC is not api stable yet and will continue to evolve in the 3.x line. Spark is designed in

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread DB Tsai
+1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support We had an internal preview version of Spark 3.0 for our customers to try out for a while, and then we realized that it's very challenging for enterprise applications in production to move to Spark 3.0. For example, many of our

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Jungtaek Lim
I guess we already went through the same discussion, right? If anyone is missed, please go through the discussion thread. [1] The consensus looks to be not positive to migrate the new DSv2 into Spark 2.x version line, because the change is pretty much huge, and also backward incompatible. What I

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Ryan Blue
+1 for a 2.x release with a DSv2 API that matches 3.0. There are a lot of big differences between the API in 2.4 and 3.0, and I think a release to help migrate would be beneficial to organizations like ours that will be supporting 2.x and 3.0 in parallel for quite a while. Migration to Spark 3 is

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Xiao Li
Based on my understanding, DSV2 is not stable yet. It still misses various features. Even our built-in file sources are still unable to fully migrate to DSV2. We plan to enhance it in the next few releases to close the gap. Also, the changes on DSV2 in Spark 3.0 did not break any existing

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Holden Karau
So I one of the things which we’re planning on backporting internally is DSv2, which I think being available in a community release in a 2 branch would be more broadly useful. Anything else on top of that would be on a case by case basis for if they make an easier upgrade path to 3. If we’re

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Sean Owen
What is the functionality that would go into a 2.5.0 release, that can't be in a 2.4.7 release? I think that's the key question. 2.4.x is the 2.x maintenance branch, and I personally could imagine being open to more freely backporting a few new features for 2.x users, whereas usually it's only bug

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Xiao Li
Which new functionalities are you referring to? In Spark SQL, most of the major features in Spark 3.0 are difficult/time-consuming to backport. For example, adaptive query execution. Releasing a new version is not hard, but backporting/reviewing/maintaining these features are very time-consuming.

Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Holden Karau
Hi Folks, As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5 release. Spark 3 brings a number of important changes, and by its nature is not backward compatible. I think we'd all like to have as smooth an upgrade experience to Spark 3 as possible, and I believe that having a Spark

Re: [EXTERNAL] Re: ColumnnarBatch to InternalRow Cast exception with codegen enabled.

2020-06-12 Thread Kris Mo
Hi Nasrulla, Without details of your code / configuration, it's a bit hard to tell what exactly went wrong, since there can be a lot of places that could go wrong... But one thing for sure is that, the interpreted code path (non-WSCG) and the WSCG path are two separate things and it wouldn't

RE: [EXTERNAL] Re: ColumnnarBatch to InternalRow Cast exception with codegen enabled.

2020-06-12 Thread Nasrulla Khan Haris
Thanks Kris for your inputs. Yes I have a new data source which wraps around built-in parquet data source. What I do not understand is with WSCG disabled, output is not columnar batch, if my changes do not handle columnar support, shouldn’t the behavior remain same with or without WSCG.

RE: [EXTERNAL] Re: ColumnnarBatch to InternalRow Cast exception with codegen enabled.

2020-06-12 Thread Nasrulla Khan Haris
Thanks Kris for your inputs. Yes I have a new data source which wraps around builtin parquet data source. What I do not understand is with WSCG disabled, Output is not columnar batch. From: Kris Mo Sent: Friday, June 12, 2020 2:20 AM To: Nasrulla Khan Haris Cc: dev@spark.apache.org Subject:

Why time difference while registering a new BlockManager (using BlockManagerMasterEndpoint)?

2020-06-12 Thread Jacek Laskowski
Hi, Just noticed an inconsistency between times when a BlockManager is about to be registered [1][2] and the time listeners are going to be informed [3], and got curious whether it's intentional or not. Why is the `time` value not used for SparkListenerBlockManagerAdded message? [1]

Re: ColumnnarBatch to InternalRow Cast exception with codegen enabled.

2020-06-12 Thread Kris Mo
Hi Nasrulla, Not sure what your new code is doing, but the symptom looks like you're creating a new data source that wraps around the builtin Parquet data source? The problem here is, whole-stage codegen generated code for row-based input, but the actual input is columnar. In other words, in