Re: Revisiting the idea of a Spark 2.5 transitional release

Jungtaek Lim Fri, 12 Jun 2020 20:51:41 -0700

I guess we already went through the same discussion, right? If anyone is
missed, please go through the discussion thread. [1] The consensus looks to
be not positive to migrate the new DSv2 into Spark 2.x version line,
because the change is pretty much huge, and also backward incompatible.


What I can think of benefits of having Spark 2.5 is to avoid force upgrade
to the major release to have fixes for critical bugs. Not all critical
fixes were landed to 2.x as well, because some fixes bring backward
incompatibility. We don't land these fixes to the 2.x version line because
we didn't consider having Spark 2.5 before - we don't want to let end users
tolerate the inconvenience during upgrading bugfix version. End users may
be OK to tolerate during upgrading minor version, since they can still live
with 2.4.x to deny these fixes.

In addition, given there's a huge time gap between Spark 2.4 and 3.0, we
might want to consider porting some of features which don't bring backward
incompatibility. Well, new major features of Spark 3.0 would be probably
better to be introduced in Spark 3.0, but some features could be,
especially if the feature resolves the long-standing issue or the feature
has been provided for a long time in competitive products.

Thanks,
Jungtaek Lim (HeartSaVioR)

1.
http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979

On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> +1 for a 2.x release with a DSv2 API that matches 3.0.
>
> There are a lot of big differences between the API in 2.4 and 3.0, and I
> think a release to help migrate would be beneficial to organizations like
> ours that will be supporting 2.x and 3.0 in parallel for quite a while.
> Migration to Spark 3 is going to take time as people build confidence in
> it. I don't think that can be avoided by leaving a larger feature gap
> between 2.x and 3.0.
>
> On Fri, Jun 12, 2020 at 5:53 PM Xiao Li <lix...@databricks.com> wrote:
>
>> Based on my understanding, DSV2 is not stable yet. It still
>> misses various features. Even our built-in file sources are still unable to
>> fully migrate to DSV2. We plan to enhance it in the next few releases to
>> close the gap.
>>
>> Also, the changes on DSV2 in Spark 3.0 did not break any existing
>> application. We should encourage more users to try Spark 3 and increase the
>> adoption of Spark 3.x.
>>
>> Xiao
>>
>> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau <hol...@pigscanfly.ca>
>> wrote:
>>
>>> So I one of the things which we’re planning on backporting internally is
>>> DSv2, which I think being available in a community release in a 2 branch
>>> would be more broadly useful. Anything else on top of that would be on a
>>> case by case basis for if they make an easier upgrade path to 3.
>>>
>>> If we’re worried about people using 2.5 as a long term home we could
>>> always mark it with “-transitional” or something similar?
>>>
>>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> What is the functionality that would go into a 2.5.0 release, that
>>>> can't be in a 2.4.7 release? I think that's the key question. 2.4.x is the
>>>> 2.x maintenance branch, and I personally could imagine being open to more
>>>> freely backporting a few new features for 2.x users, whereas usually it's
>>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance
>>>> branch but there's something too big for a 'normal' maintenance release,
>>>> and I think the whole question turns on what that is.
>>>>
>>>> If it's things like JDK 11 support, I think that is unfortunately
>>>> fairly 'breaking' because of dependency updates. But maybe that's not it.
>>>>
>>>>
>>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>> Hi Folks,
>>>>>
>>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5
>>>>> release. Spark 3 brings a number of important changes, and by its nature 
>>>>> is
>>>>> not backward compatible. I think we'd all like to have as smooth an 
>>>>> upgrade
>>>>> experience to Spark 3 as possible, and I believe that having a Spark 2
>>>>> release some of the new functionality while continuing to support the 
>>>>> older
>>>>> APIs and current Scala version would make the upgrade path smoother.
>>>>>
>>>>> This pattern is not uncommon in other Hadoop ecosystem projects, like
>>>>> Hadoop itself and HBase.
>>>>>
>>>>> I know that Ryan Blue has indicated he is already going to be
>>>>> maintaining something like that internally at Netflix, and we'll be doing
>>>>> the same thing at Apple. It seems like having a transitional release could
>>>>> benefit the community with easy migrations and help avoid duplicated work.
>>>>>
>>>>> I want to be clear I'm volunteering to do the work of managing a 2.5
>>>>> release, so hopefully, this wouldn't create any substantial burdens on the
>>>>> community.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Holden
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>
>> --
>> <https://databricks.com/sparkaisummit/north-america>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Revisiting the idea of a Spark 2.5 transitional release

Reply via email to