Echoing Sean's earlier comment … What is the functionality that would go into a 2.5.0 release, that can't be in a 2.4.7 release?
On Fri, Jun 12, 2020 at 11:14 PM, Holden Karau < hol...@pigscanfly.ca > wrote: > > Can I suggest we maybe decouple this conversation a bit? First, if there > is an agreement in making a transitional release in principle and then > folks who feel strongly about specific backports can have their respective > discussions. It ( http://discussions.it/ ) 's not like we normally know or > have agreement on everything going into a release at the time we cut the > branch. > > On Fri, Jun 12, 2020 at 10:28 PM Reynold Xin < rxin@ databricks. com ( > r...@databricks.com ) > wrote: > > >> I understand the argument to add JDK 11 support just to extend the EOL, >> but the other things seem kind of arbitrary and are not supported by your >> arguments, especially DSv2 which is a massive change. DSv2 IIUC is not api >> stable yet and will continue to evolve in the 3.x line. >> >> >> Spark is designed in a way that’s decoupled from storage, and as a result >> one can run multiple versions of Spark in parallel during migration. >> > > At the job level sure, but upgrading large jobs, possibly written in Scala > 2.11, whole-hog as it currently stands is not a small matter. > >> >> On Fri, Jun 12, 2020 at 9:40 PM DB Tsai < dbtsai@ dbtsai. com ( >> dbt...@dbtsai.com ) > wrote: >> >> >>> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support >>> >>> >>> >>> We had an internal preview version of Spark 3.0 for our customers to try >>> out for a while, and then we realized that it's very challenging for >>> enterprise applications in production to move to Spark 3.0. For example, >>> many of our customers' Spark applications depend on some internal projects >>> that may not be owned by ETL teams; it requires much coordination with >>> other teams to cross-build the dependencies that Spark applications depend >>> on with Scala 2.12 in order to use Spark 3.0. Now, we removed the support >>> of Scala 2.11 in Spark 3.0, this results in a really big gap to migrate >>> from 2.x version to 3.0 based on my observation working with our >>> customers. >>> >>> >>> Also, JDK8 is already EOL, in some companies, using JDK8 is not supported >>> by the infra team, and requires an exception to use unsupported JDK. Of >>> course, for those companies, they can use vendor's Spark distribution such >>> as CDH Spark 2.4 which supports JDK11 or they can maintain their own Spark >>> release which is possible but not very trivial. >>> >>> >>> As a result, having a 2.5 release with DSv2, JDK11, and Scala 2.11 support >>> can definitely lower the gap, and users can still move forward using new >>> features. Afterall, the reason why we are working on OSS is we like people >>> to use our code, isn't it? >>> >>> Sincerely, >>> >>> DB Tsai >>> ---------------------------------------------------------- >>> Web: https:/ / www. dbtsai. com ( https://www.dbtsai.com ) >>> PGP Key ID: 42E5B25A8F7A82C1 >>> >>> >>> >>> On Fri, Jun 12, 2020 at 8:51 PM Jungtaek Lim < kabhwan. opensource@ gmail. >>> com ( kabhwan.opensou...@gmail.com ) > wrote: >>> >>> >>>> I guess we already went through the same discussion, right? If anyone is >>>> missed, please go through the discussion thread. [1] The consensus looks >>>> to be not positive to migrate the new DSv2 into Spark 2.x version line, >>>> because the change is pretty much huge, and also backward incompatible. >>>> >>>> >>>> What I can think of benefits of having Spark 2.5 is to avoid force upgrade >>>> to the major release to have fixes for critical bugs. Not all critical >>>> fixes were landed to 2.x as well, because some fixes bring backward >>>> incompatibility. We don't land these fixes to the 2.x version line because >>>> we didn't consider having Spark 2.5 before - we don't want to let end >>>> users tolerate the inconvenience during upgrading bugfix version. End >>>> users may be OK to tolerate during upgrading minor version, since they can >>>> still live with 2.4.x to deny these fixes. >>>> >>>> >>>> In addition, given there's a huge time gap between Spark 2.4 and 3.0, we >>>> might want to consider porting some of features which don't bring backward >>>> incompatibility. Well, new major features of Spark 3.0 would be probably >>>> better to be introduced in Spark 3.0, but some features could be, >>>> especially if the feature resolves the long-standing issue or the feature >>>> has been provided for a long time in competitive products. >>>> >>>> >>>> Thanks, >>>> Jungtaek Lim (HeartSaVioR) >>>> >>>> >>>> 1. http:/ / apache-spark-developers-list. 1001551. n3. nabble. com/ >>>> DISCUSS-Spark-2-5-release-td27963. >>>> html#a27979 ( >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979 >>>> ) >>>> >>>> On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue < rblue@ netflix. com. invalid ( >>>> rb...@netflix.com.invalid ) > wrote: >>>> >>>> >>>>> +1 for a 2.x release with a DSv2 API that matches 3.0. >>>>> >>>>> >>>>> There are a lot of big differences between the API in 2.4 and 3.0, and I >>>>> think a release to help migrate would be beneficial to organizations like >>>>> ours that will be supporting 2.x and 3.0 in parallel for quite a while. >>>>> Migration to Spark 3 is going to take time as people build confidence in >>>>> it. I don't think that can be avoided by leaving a larger feature gap >>>>> between 2.x and 3.0. >>>>> >>>>> >>>>> On Fri, Jun 12, 2020 at 5:53 PM Xiao Li < lixiao@ databricks. com ( >>>>> lix...@databricks.com ) > wrote: >>>>> >>>>> >>>>>> Based on my understanding, DSV2 is not stable yet. It still misses >>>>>> various >>>>>> features. Even our built-in file sources are still unable to fully >>>>>> migrate >>>>>> to DSV2. We plan to enhance it in the next few releases to close the gap. >>>>>> >>>>>> >>>>>> Also, the changes on DSV2 in Spark 3.0 did not break any existing >>>>>> application. We should encourage more users to try Spark 3 and increase >>>>>> the adoption of Spark 3.x. >>>>>> >>>>>> >>>>>> Xiao >>>>>> >>>>>> >>>>>> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau < holden@ pigscanfly. ca ( >>>>>> hol...@pigscanfly.ca ) > wrote: >>>>>> >>>>>> >>>>>>> So I one of the things which we’re planning on backporting internally is >>>>>>> DSv2, which I think being available in a community release in a 2 branch >>>>>>> would be more broadly useful. Anything else on top of that would be on a >>>>>>> case by case basis for if they make an easier upgrade path to 3. >>>>>>> >>>>>>> >>>>>>> If we’re worried about people using 2.5 as a long term home we could >>>>>>> always mark it with “-transitional” or something similar? >>>>>>> >>>>>>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen < srowen@ gmail. com ( >>>>>>> sro...@gmail.com ) > wrote: >>>>>>> >>>>>>> >>>>>>>> What is the functionality that would go into a 2.5.0 release, that >>>>>>>> can't >>>>>>>> be in a 2.4.7 release? I think that's the key question. 2.4.x is the >>>>>>>> 2.x >>>>>>>> maintenance branch, and I personally could imagine being open to more >>>>>>>> freely backporting a few new features for 2.x users, whereas usually >>>>>>>> it's >>>>>>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance >>>>>>>> branch but there's something too big for a 'normal' maintenance >>>>>>>> release, >>>>>>>> and I think the whole question turns on what that is. >>>>>>>> >>>>>>>> >>>>>>>> If it's things like JDK 11 support, I think that is unfortunately >>>>>>>> fairly >>>>>>>> 'breaking' because of dependency updates. But maybe that's not it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau < holden@ pigscanfly. ca ( >>>>>>>> hol...@pigscanfly.ca ) > wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Hi Folks, >>>>>>>>> >>>>>>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5 >>>>>>>>> release. Spark 3 brings a number of important changes, and by its >>>>>>>>> nature >>>>>>>>> is not backward compatible. I think we'd all like to have as smooth an >>>>>>>>> upgrade experience to Spark 3 as possible, and I believe that having a >>>>>>>>> Spark 2 release some of the new functionality while continuing to >>>>>>>>> support >>>>>>>>> the older APIs and current Scala version would make the upgrade path >>>>>>>>> smoother. >>>>>>>>> >>>>>>>>> >>>>>>>>> This pattern is not uncommon in other Hadoop ecosystem projects, like >>>>>>>>> Hadoop itself and HBase. >>>>>>>>> >>>>>>>>> I know that Ryan Blue has indicated he is already going to be >>>>>>>>> maintaining >>>>>>>>> something like that internally at Netflix, and we'll be doing the same >>>>>>>>> thing at Apple. It seems like having a transitional release could >>>>>>>>> benefit >>>>>>>>> the community with easy migrations and help avoid duplicated work. >>>>>>>>> >>>>>>>>> I want to be clear I'm volunteering to do the work of managing a 2.5 >>>>>>>>> release, so hopefully, this wouldn't create any substantial burdens >>>>>>>>> on the >>>>>>>>> community. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Holden >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Twitter: https:/ / twitter. com/ holdenkarau ( >>>>>>>>> https://twitter.com/holdenkarau ) >>>>>>>>> >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. >>>>>>>>> to/ 2MaRAG9 >>>>>>>>> ( https://amzn.to/2MaRAG9 ) >>>>>>>>> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau ( >>>>>>>>> https://www.youtube.com/user/holdenkarau ) >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Twitter: https:/ / twitter. com/ holdenkarau ( >>>>>>> https://twitter.com/holdenkarau ) >>>>>>> >>>>>>> Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. >>>>>>> to/ 2MaRAG9 >>>>>>> ( https://amzn.to/2MaRAG9 ) >>>>>>> YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau ( >>>>>>> https://www.youtube.com/user/holdenkarau ) >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ( https://databricks.com/sparkaisummit/north-america ) >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Software Engineer >>>>> Netflix >>>>> >>>> >>>> >>> >>> >> >> > > > > > -- > Twitter: https:/ / twitter. com/ holdenkarau ( > https://twitter.com/holdenkarau ) > > Books (Learning Spark, High Performance Spark, etc.): https:/ / amzn. to/ > 2MaRAG9 > ( https://amzn.to/2MaRAG9 ) > YouTube Live Streams: https:/ / www. youtube. com/ user/ holdenkarau ( > https://www.youtube.com/user/holdenkarau ) >
smime.p7s
Description: S/MIME Cryptographic Signature