These two are coupled, and in tension: don't want to take much change, but
do want changes that will unfortunately be somewhat breaking. A 2.5 release
with these items would be different enough as to strain the general level
of compatibility implied by a minor release. Sure, it's not 'just' a
maintenance release, but de facto it becomes the maintenance branch of all
of 2.x then, so kind of us. 2.4.x users then need to move to 2.5 too as
eventually it's the only 2.x maintenance branch. OK, you can maintain 2.4.x
and 2.5.x until 2.x is EOL, which does increase the complexity: everything
backported goes to 2 branches, has to work with both.

I don't know if there's a reason to cut 2.5.0 just on principle; it had
seemed pretty clear to me with 3.0 that 2.4.x was simply the last 2.x
release. We normally maintain version x and x+1, and will expand to
maintain 2.x + 3.0.x + 3.1.x soon. So it does depend on what would go in it.

One person's breaking change is another person's just-fine enhancement
though. People wouldn't suggest it here unless they were in the latter
group (though are we all talking about the same two major items?)
What I don't know is how that looks across the wider user base. Obviously,
here are a few important votes in favor. On the other hand I haven't heard
of significant issues in updating to 3.0 during the preview releases, which
could suggest that users that DSv2 et al can just move to 3.0.

On the items: I don't know enough about DSv2 to say, but that seems like a
big change to back port.
On JDK11: I understand Java 8 is EOL w.r.t. Oracle, but OpenJDK 8 is still
being updated, and even Oracle supports it (for $). I have not perceived
this to be a significant issue inside or outside Spark, anecdotally.

Yes, this can also be where downstream vendors supply support for a
specialized hybrid build.

I'm not sure there's an objectively right call here, certainly without more
than anecdotal or personal perspective on the tradeoffs. It still seems
like the current plan is fine to me though, to leave these items in 3.0.

We can also wait-and-see. If after 3.0 is GA there is clearly wide demand
for a transitional release, that could change the calculation.


On Fri, Jun 12, 2020 at 11:40 PM DB Tsai <dbt...@dbtsai.com> wrote:

> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support
>
> We had an internal preview version of Spark 3.0 for our customers to try
> out for a while, and then we realized that it's very challenging for
> enterprise applications in production to move to Spark 3.0. For example,
> many of our customers' Spark applications depend on some internal projects
> that may not be owned by ETL teams; it requires much coordination with
> other teams to cross-build the dependencies that Spark applications depend
> on with Scala 2.12 in order to use Spark 3.0. Now, we removed the support
> of Scala 2.11 in Spark 3.0, this results in a really big gap to migrate
> from 2.x version to 3.0 based on my observation working with our customers.
>
> Also, JDK8 is already EOL, in some companies, using JDK8 is not supported
> by the infra team, and requires an exception to use unsupported JDK. Of
> course, for those companies, they can use vendor's Spark distribution such
> as CDH Spark 2.4 which supports JDK11 or they can maintain their own Spark
> release which is possible but not very trivial.
>
> As a result, having a 2.5 release with DSv2, JDK11, and Scala 2.11 support
> can definitely lower the gap, and users can still move forward using new
> features. Afterall, the reason why we are working on OSS is we like people
> to use our code, isn't it?
>

Reply via email to