+1 for a 2.x release with a DSv2 API that matches 3.0.

There are a lot of big differences between the API in 2.4 and 3.0, and I
think a release to help migrate would be beneficial to organizations like
ours that will be supporting 2.x and 3.0 in parallel for quite a while.
Migration to Spark 3 is going to take time as people build confidence in
it. I don't think that can be avoided by leaving a larger feature gap
between 2.x and 3.0.

On Fri, Jun 12, 2020 at 5:53 PM Xiao Li <lix...@databricks.com> wrote:

> Based on my understanding, DSV2 is not stable yet. It still misses various
> features. Even our built-in file sources are still unable to fully migrate
> to DSV2. We plan to enhance it in the next few releases to close the gap.
>
> Also, the changes on DSV2 in Spark 3.0 did not break any existing
> application. We should encourage more users to try Spark 3 and increase the
> adoption of Spark 3.x.
>
> Xiao
>
> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> So I one of the things which we’re planning on backporting internally is
>> DSv2, which I think being available in a community release in a 2 branch
>> would be more broadly useful. Anything else on top of that would be on a
>> case by case basis for if they make an easier upgrade path to 3.
>>
>> If we’re worried about people using 2.5 as a long term home we could
>> always mark it with “-transitional” or something similar?
>>
>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> What is the functionality that would go into a 2.5.0 release, that can't
>>> be in a 2.4.7 release? I think that's the key question. 2.4.x is the 2.x
>>> maintenance branch, and I personally could imagine being open to more
>>> freely backporting a few new features for 2.x users, whereas usually it's
>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance
>>> branch but there's something too big for a 'normal' maintenance release,
>>> and I think the whole question turns on what that is.
>>>
>>> If it's things like JDK 11 support, I think that is unfortunately fairly
>>> 'breaking' because of dependency updates. But maybe that's not it.
>>>
>>>
>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau <hol...@pigscanfly.ca>
>>> wrote:
>>>
>>>> Hi Folks,
>>>>
>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5
>>>> release. Spark 3 brings a number of important changes, and by its nature is
>>>> not backward compatible. I think we'd all like to have as smooth an upgrade
>>>> experience to Spark 3 as possible, and I believe that having a Spark 2
>>>> release some of the new functionality while continuing to support the older
>>>> APIs and current Scala version would make the upgrade path smoother.
>>>>
>>>> This pattern is not uncommon in other Hadoop ecosystem projects, like
>>>> Hadoop itself and HBase.
>>>>
>>>> I know that Ryan Blue has indicated he is already going to be
>>>> maintaining something like that internally at Netflix, and we'll be doing
>>>> the same thing at Apple. It seems like having a transitional release could
>>>> benefit the community with easy migrations and help avoid duplicated work.
>>>>
>>>> I want to be clear I'm volunteering to do the work of managing a 2.5
>>>> release, so hopefully, this wouldn't create any substantial burdens on the
>>>> community.
>>>>
>>>> Cheers,
>>>>
>>>> Holden
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> <https://databricks.com/sparkaisummit/north-america>
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to