Echoing Sean's earlier comment … What is the functionality that would go into a 
2.5.0 release, that can't be in a 2.4.7 release?

On Fri, Jun 12, 2020 at 11:14 PM, Holden Karau < hol...@pigscanfly.ca > wrote:

> 
> Can I suggest we maybe decouple this conversation a bit? First, if there
> is an agreement in making a transitional release in principle and then
> folks who feel strongly about specific backports can have their respective
> discussions. It ( http://discussions.it/ ) 's not like we normally know or
> have agreement on everything going into a release at the time we cut the
> branch.
> 
> On Fri, Jun 12, 2020 at 10:28 PM Reynold Xin < rxin@ databricks. com (
> r...@databricks.com ) > wrote:
> 
> 
>> I understand the argument to add JDK 11 support just to extend the EOL,
>> but the other things seem kind of arbitrary and are not supported by your
>> arguments, especially DSv2 which is a massive change. DSv2 IIUC is not api
>> stable yet and will continue to evolve in the 3.x line.
>> 
>> 
>> Spark is designed in a way that’s decoupled from storage, and as a result
>> one can run multiple versions of Spark in parallel during migration.
>> 
> 
> At the job level sure, but upgrading large jobs, possibly written in Scala
> 2.11, whole-hog as it currently stands is not a small matter.
> 
>> 
>> On Fri, Jun 12, 2020 at 9:40 PM DB Tsai < dbtsai@ dbtsai. com (
>> dbt...@dbtsai.com ) > wrote:
>> 
>> 
>>> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support
>>> 
>>> 
>>> 
>>> We had an internal preview version of Spark 3.0 for our customers to try
>>> out for a while, and then we realized that it's very challenging for
>>> enterprise applications in production to move to Spark 3.0. For example,
>>> many of our customers' Spark applications depend on some internal projects
>>> that may not be owned by ETL teams; it requires much coordination with
>>> other teams to cross-build the dependencies that Spark applications depend
>>> on with Scala 2.12 in order to use Spark 3.0. Now, we removed the support
>>> of Scala 2.11 in Spark 3.0, this results in a really big gap to migrate
>>> from 2.x version to 3.0 based on my observation working with our
>>> customers.
>>> 
>>> 
>>> Also, JDK8 is already EOL, in some companies, using JDK8 is not supported
>>> by the infra team, and requires an exception to use unsupported JDK. Of
>>> course, for those companies, they can use vendor's Spark distribution such
>>> as CDH Spark 2.4 which supports JDK11 or they can maintain their own Spark
>>> release which is possible but not very trivial.
>>> 
>>> 
>>> As a result, having a 2.5 release with DSv2, JDK11, and Scala 2.11 support
>>> can definitely lower the gap, and users can still move forward using new
>>> features. Afterall, the reason why we are working on OSS is we like people
>>> to use our code, isn't it?
>>> 
>>> Sincerely,
>>> 
>>> DB Tsai
>>> ----------------------------------------------------------
>>> Web: https:/ / www. dbtsai. com ( https://www.dbtsai.com )
>>> PGP Key ID: 42E5B25A8F7A82C1
>>> 
>>> 
>>> 
>>> On Fri, Jun 12, 2020 at 8:51 PM Jungtaek Lim < kabhwan. opensource@ gmail.
>>> com ( kabhwan.opensou...@gmail.com ) > wrote:
>>> 
>>> 
>>>> I guess we already went through the same discussion, right? If anyone is
>>>> missed, please go through the discussion thread. [1] The consensus looks
>>>> to be not positive to migrate the new DSv2 into Spark 2.x version line,
>>>> because the change is pretty much huge, and also backward incompatible.
>>>> 
>>>> 
>>>> What I can think of benefits of having Spark 2.5 is to avoid force upgrade
>>>> to the major release to have fixes for critical bugs. Not all critical
>>>> fixes were landed to 2.x as well, because some fixes bring backward
>>>> incompatibility. We don't land these fixes to the 2.x version line because
>>>> we didn't consider having Spark 2.5 before - we don't want to let end
>>>> users tolerate the inconvenience during upgrading bugfix version. End
>>>> users may be OK to tolerate during upgrading minor version, since they can
>>>> still live with 2.4.x to deny these fixes.
>>>> 
>>>> 
>>>> In addition, given there's a huge time gap between Spark 2.4 and 3.0, we
>>>> might want to consider porting some of features which don't bring backward
>>>> incompatibility. Well, new major features of Spark 3.0 would be probably
>>>> better to be introduced in Spark 3.0, but some features could be,
>>>> especially if the feature resolves the long-standing issue or the feature
>>>> has been provided for a long time in competitive products.
>>>> 
>>>> 
>>>> Thanks,
>>>> Jungtaek Lim (HeartSaVioR)
>>>> 
>>>> 
>>>> 1. http:/ / apache-spark-developers-list. 1001551. n3. nabble. com/ 
>>>> DISCUSS-Spark-2-5-release-td27963.
>>>> html#a27979 (
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979
>>>> )
>>>> 
>>>> On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue < rblue@ netflix. com. invalid (
>>>> rb...@netflix.com.invalid ) > wrote:
>>>> 
>>>> 
>>>>> +1 for a 2.x release with a DSv2 API that matches 3.0.
>>>>> 
>>>>> 
>>>>> There are a lot of big differences between the API in 2.4 and 3.0, and I
>>>>> think a release to help migrate would be beneficial to organizations like
>>>>> ours that will be supporting 2.x and 3.0 in parallel for quite a while.
>>>>> Migration to Spark 3 is going to take time as people build confidence in
>>>>> it. I don't think that can be avoided by leaving a larger feature gap
>>>>> between 2.x and 3.0.
>>>>> 
>>>>> 
>>>>> On Fri, Jun 12, 2020 at 5:53 PM Xiao Li < lixiao@ databricks. com (
>>>>> lix...@databricks.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> Based on my understanding, DSV2 is not stable yet. It still misses 
>>>>>> various
>>>>>> features. Even our built-in file sources are still unable to fully 
>>>>>> migrate
>>>>>> to DSV2. We plan to enhance it in the next few releases to close the gap.
>>>>>> 
>>>>>> 
>>>>>> Also, the changes on DSV2 in Spark 3.0 did not break any existing
>>>>>> application. We should encourage more users to try Spark 3 and increase
>>>>>> the adoption of Spark 3.x.
>>>>>> 
>>>>>> 
>>>>>> Xiao
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau < holden@ pigscanfly. ca (
>>>>>> hol...@pigscanfly.ca ) > wrote:
>>>>>> 
>>>>>> 
>>>>>>> So I one of the things which we’re planning on backporting internally is
>>>>>>> DSv2, which I think being available in a community release in a 2 branch
>>>>>>> would be more broadly useful. Anything else on top of that would be on a
>>>>>>> case by case basis for if they make an easier upgrade path to 3.
>>>>>>> 
>>>>>>> 
>>>>>>> If we’re worried about people using 2.5 as a long term home we could
>>>>>>> always mark it with “-transitional” or something similar?
>>>>>>> 
>>>>>>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen < srowen@ gmail. com (
>>>>>>> sro...@gmail.com ) > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> What is the functionality that would go into a 2.5.0 release, that 
>>>>>>>> can't
>>>>>>>> be in a 2.4.7 release? I think that's the key question. 2.4.x is the 
>>>>>>>> 2.x
>>>>>>>> maintenance branch, and I personally could imagine being open to more
>>>>>>>> freely backporting a few new features for 2.x users, whereas usually 
>>>>>>>> it's
>>>>>>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance
>>>>>>>> branch but there's something too big for a 'normal' maintenance 
>>>>>>>> release,
>>>>>>>> and I think the whole question turns on what that is.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> If it's things like JDK 11 support, I think that is unfortunately 
>>>>>>>> fairly
>>>>>>>> 'breaking' because of dependency updates. But maybe that's not it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau < holden@ pigscanfly. ca (
>>>>>>>> hol...@pigscanfly.ca ) > wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Hi Folks,
>>>>>>>>> 
>>>>>>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5
>>>>>>>>> release. Spark 3 brings a number of important changes, and by its 
>>>>>>>>> nature
>>>>>>>>> is not backward compatible. I think we'd all like to have as smooth an
>>>>>>>>> upgrade experience to Spark 3 as possible, and I believe that having a
>>>>>>>>> Spark 2 release some of the new functionality while continuing to 
>>>>>>>>> support
>>>>>>>>> the older APIs and current Scala version would make the upgrade path
>>>>>>>>> smoother.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This pattern is not uncommon in other Hadoop ecosystem projects, like
>>>>>>>>> Hadoop itself and HBase.
>>>>>>>>> 
>>>>>>>>> I know that Ryan Blue has indicated he is already going to be 
>>>>>>>>> maintaining
>>>>>>>>> something like that internally at Netflix, and we'll be doing the same
>>>>>>>>> thing at Apple. It seems like having a transitional release could 
>>>>>>>>> benefit
>>>>>>>>> the community with easy migrations and help avoid duplicated work.
>>>>>>>>> 
>>>>>>>>> I want to be clear I'm volunteering to do the work of managing a 2.5
>>>>>>>>> release, so hopefully, this wouldn't create any substantial burdens 
>>>>>>>>> on the
>>>>>>>>> community.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> 
>>>>>>>>> Holden
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Twitter: https:/ / twitter. com/ holdenkarau (
>>>>>>>>> https://twitter.com/holdenkarau )
>>>>>>>>> 
>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. 
>>>>>>>>> to/ 2MaRAG9
>>>>>>>>> ( https://amzn.to/2MaRAG9 )
>>>>>>>>> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau (
>>>>>>>>> https://www.youtube.com/user/holdenkarau )
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Twitter: https:/ / twitter. com/ holdenkarau (
>>>>>>> https://twitter.com/holdenkarau )
>>>>>>> 
>>>>>>> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. 
>>>>>>> to/ 2MaRAG9
>>>>>>> ( https://amzn.to/2MaRAG9 )
>>>>>>> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau (
>>>>>>> https://www.youtube.com/user/holdenkarau )
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> ( https://databricks.com/sparkaisummit/north-america )
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> 
> --
> Twitter: https:/ / twitter. com/ holdenkarau (
> https://twitter.com/holdenkarau )
> 
> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. to/ 
> 2MaRAG9
> ( https://amzn.to/2MaRAG9 )
> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau (
> https://www.youtube.com/user/holdenkarau )
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to