+1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support

We had an internal preview version of Spark 3.0 for our customers to try
out for a while, and then we realized that it's very challenging for
enterprise applications in production to move to Spark 3.0. For example,
many of our customers' Spark applications depend on some internal projects
that may not be owned by ETL teams; it requires much coordination with
other teams to cross-build the dependencies that Spark applications depend
on with Scala 2.12 in order to use Spark 3.0. Now, we removed the support
of Scala 2.11 in Spark 3.0, this results in a really big gap to migrate
from 2.x version to 3.0 based on my observation working with our customers.

Also, JDK8 is already EOL, in some companies, using JDK8 is not supported
by the infra team, and requires an exception to use unsupported JDK. Of
course, for those companies, they can use vendor's Spark distribution such
as CDH Spark 2.4 which supports JDK11 or they can maintain their own Spark
release which is possible but not very trivial.

As a result, having a 2.5 release with DSv2, JDK11, and Scala 2.11 support
can definitely lower the gap, and users can still move forward using new
features. Afterall, the reason why we are working on OSS is we like people
to use our code, isn't it?

Sincerely,

DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1


On Fri, Jun 12, 2020 at 8:51 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> I guess we already went through the same discussion, right? If anyone is
> missed, please go through the discussion thread. [1] The consensus looks to
> be not positive to migrate the new DSv2 into Spark 2.x version line,
> because the change is pretty much huge, and also backward incompatible.
>
> What I can think of benefits of having Spark 2.5 is to avoid force upgrade
> to the major release to have fixes for critical bugs. Not all critical
> fixes were landed to 2.x as well, because some fixes bring backward
> incompatibility. We don't land these fixes to the 2.x version line because
> we didn't consider having Spark 2.5 before - we don't want to let end users
> tolerate the inconvenience during upgrading bugfix version. End users may
> be OK to tolerate during upgrading minor version, since they can still live
> with 2.4.x to deny these fixes.
>
> In addition, given there's a huge time gap between Spark 2.4 and 3.0, we
> might want to consider porting some of features which don't bring backward
> incompatibility. Well, new major features of Spark 3.0 would be probably
> better to be introduced in Spark 3.0, but some features could be,
> especially if the feature resolves the long-standing issue or the feature
> has been provided for a long time in competitive products.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979
>
> On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> +1 for a 2.x release with a DSv2 API that matches 3.0.
>>
>> There are a lot of big differences between the API in 2.4 and 3.0, and I
>> think a release to help migrate would be beneficial to organizations like
>> ours that will be supporting 2.x and 3.0 in parallel for quite a while.
>> Migration to Spark 3 is going to take time as people build confidence in
>> it. I don't think that can be avoided by leaving a larger feature gap
>> between 2.x and 3.0.
>>
>> On Fri, Jun 12, 2020 at 5:53 PM Xiao Li <lix...@databricks.com> wrote:
>>
>>> Based on my understanding, DSV2 is not stable yet. It still
>>> misses various features. Even our built-in file sources are still unable to
>>> fully migrate to DSV2. We plan to enhance it in the next few releases to
>>> close the gap.
>>>
>>> Also, the changes on DSV2 in Spark 3.0 did not break any existing
>>> application. We should encourage more users to try Spark 3 and increase the
>>> adoption of Spark 3.x.
>>>
>>> Xiao
>>>
>>> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> So I one of the things which we’re planning on backporting internally
>>>> is DSv2, which I think being available in a community release in a 2 branch
>>>> would be more broadly useful. Anything else on top of that would be on a
>>>> case by case basis for if they make an easier upgrade path to 3.
>>>>
>>>> If we’re worried about people using 2.5 as a long term home we could
>>>> always mark it with “-transitional” or something similar?
>>>>
>>>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> What is the functionality that would go into a 2.5.0 release, that
>>>>> can't be in a 2.4.7 release? I think that's the key question. 2.4.x is the
>>>>> 2.x maintenance branch, and I personally could imagine being open to more
>>>>> freely backporting a few new features for 2.x users, whereas usually it's
>>>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance
>>>>> branch but there's something too big for a 'normal' maintenance release,
>>>>> and I think the whole question turns on what that is.
>>>>>
>>>>> If it's things like JDK 11 support, I think that is unfortunately
>>>>> fairly 'breaking' because of dependency updates. But maybe that's not it.
>>>>>
>>>>>
>>>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> Hi Folks,
>>>>>>
>>>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5
>>>>>> release. Spark 3 brings a number of important changes, and by its nature 
>>>>>> is
>>>>>> not backward compatible. I think we'd all like to have as smooth an 
>>>>>> upgrade
>>>>>> experience to Spark 3 as possible, and I believe that having a Spark 2
>>>>>> release some of the new functionality while continuing to support the 
>>>>>> older
>>>>>> APIs and current Scala version would make the upgrade path smoother.
>>>>>>
>>>>>> This pattern is not uncommon in other Hadoop ecosystem projects, like
>>>>>> Hadoop itself and HBase.
>>>>>>
>>>>>> I know that Ryan Blue has indicated he is already going to be
>>>>>> maintaining something like that internally at Netflix, and we'll be doing
>>>>>> the same thing at Apple. It seems like having a transitional release 
>>>>>> could
>>>>>> benefit the community with easy migrations and help avoid duplicated 
>>>>>> work.
>>>>>>
>>>>>> I want to be clear I'm volunteering to do the work of managing a 2.5
>>>>>> release, so hopefully, this wouldn't create any substantial burdens on 
>>>>>> the
>>>>>> community.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Holden
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>>>
>>> --
>>> <https://databricks.com/sparkaisummit/north-america>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Reply via email to