Can I suggest we maybe decouple this conversation a bit? First, if there is an agreement in making a transitional release in principle and then folks who feel strongly about specific backports can have their respective discussions.It's not like we normally know or have agreement on everything going into a release at the time we cut the branch.
On Fri, Jun 12, 2020 at 10:28 PM Reynold Xin <r...@databricks.com> wrote: > I understand the argument to add JDK 11 support just to extend the EOL, > but the other things seem kind of arbitrary and are not supported by your > arguments, especially DSv2 which is a massive change. DSv2 IIUC is not api > stable yet and will continue to evolve in the 3.x line. > > Spark is designed in a way that’s decoupled from storage, and as a result > one can run multiple versions of Spark in parallel during migration. > At the job level sure, but upgrading large jobs, possibly written in Scala 2.11, whole-hog as it currently stands is not a small matter. > > On Fri, Jun 12, 2020 at 9:40 PM DB Tsai <dbt...@dbtsai.com> wrote: > >> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support >> >> We had an internal preview version of Spark 3.0 for our customers to try >> out for a while, and then we realized that it's very challenging for >> enterprise applications in production to move to Spark 3.0. For example, >> many of our customers' Spark applications depend on some internal projects >> that may not be owned by ETL teams; it requires much coordination with >> other teams to cross-build the dependencies that Spark applications depend >> on with Scala 2.12 in order to use Spark 3.0. Now, we removed the support >> of Scala 2.11 in Spark 3.0, this results in a really big gap to migrate >> from 2.x version to 3.0 based on my observation working with our customers. >> >> Also, JDK8 is already EOL, in some companies, using JDK8 is not supported >> by the infra team, and requires an exception to use unsupported JDK. Of >> course, for those companies, they can use vendor's Spark distribution such >> as CDH Spark 2.4 which supports JDK11 or they can maintain their own Spark >> release which is possible but not very trivial. >> >> As a result, having a 2.5 release with DSv2, JDK11, and Scala 2.11 >> support can definitely lower the gap, and users can still move forward >> using new features. Afterall, the reason why we are working on OSS is we >> like people to use our code, isn't it? >> >> Sincerely, >> >> DB Tsai >> ---------------------------------------------------------- >> Web: https://www.dbtsai.com >> PGP Key ID: 42E5B25A8F7A82C1 >> >> >> On Fri, Jun 12, 2020 at 8:51 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> I guess we already went through the same discussion, right? If anyone is >>> missed, please go through the discussion thread. [1] The consensus looks to >>> be not positive to migrate the new DSv2 into Spark 2.x version line, >>> because the change is pretty much huge, and also backward incompatible. >>> >>> What I can think of benefits of having Spark 2.5 is to avoid force >>> upgrade to the major release to have fixes for critical bugs. Not all >>> critical fixes were landed to 2.x as well, because some fixes bring >>> backward incompatibility. We don't land these fixes to the 2.x version line >>> because we didn't consider having Spark 2.5 before - we don't want to let >>> end users tolerate the inconvenience during upgrading bugfix version. End >>> users may be OK to tolerate during upgrading minor version, since they can >>> still live with 2.4.x to deny these fixes. >>> >>> In addition, given there's a huge time gap between Spark 2.4 and 3.0, we >>> might want to consider porting some of features which don't bring backward >>> incompatibility. Well, new major features of Spark 3.0 would be probably >>> better to be introduced in Spark 3.0, but some features could be, >>> especially if the feature resolves the long-standing issue or the feature >>> has been provided for a long time in competitive products. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> 1. >>> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979 >>> >>> On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue <rb...@netflix.com.invalid> >>> wrote: >>> >>>> +1 for a 2.x release with a DSv2 API that matches 3.0. >>>> >>>> There are a lot of big differences between the API in 2.4 and 3.0, and >>>> I think a release to help migrate would be beneficial to organizations like >>>> ours that will be supporting 2.x and 3.0 in parallel for quite a while. >>>> Migration to Spark 3 is going to take time as people build confidence in >>>> it. I don't think that can be avoided by leaving a larger feature gap >>>> between 2.x and 3.0. >>>> >>>> On Fri, Jun 12, 2020 at 5:53 PM Xiao Li <lix...@databricks.com> wrote: >>>> >>>>> Based on my understanding, DSV2 is not stable yet. It still >>>>> misses various features. Even our built-in file sources are still unable >>>>> to >>>>> fully migrate to DSV2. We plan to enhance it in the next few releases to >>>>> close the gap. >>>>> >>>>> Also, the changes on DSV2 in Spark 3.0 did not break any existing >>>>> application. We should encourage more users to try Spark 3 and increase >>>>> the >>>>> adoption of Spark 3.x. >>>>> >>>>> Xiao >>>>> >>>>> On Fri, Jun 12, 2020 at 5:36 PM Holden Karau <hol...@pigscanfly.ca> >>>>> wrote: >>>>> >>>>>> So I one of the things which we’re planning on backporting internally >>>>>> is DSv2, which I think being available in a community release in a 2 >>>>>> branch >>>>>> would be more broadly useful. Anything else on top of that would be on a >>>>>> case by case basis for if they make an easier upgrade path to 3. >>>>>> >>>>>> If we’re worried about people using 2.5 as a long term home we could >>>>>> always mark it with “-transitional” or something similar? >>>>>> >>>>>> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen <sro...@gmail.com> wrote: >>>>>> >>>>>>> What is the functionality that would go into a 2.5.0 release, that >>>>>>> can't be in a 2.4.7 release? I think that's the key question. 2.4.x is >>>>>>> the >>>>>>> 2.x maintenance branch, and I personally could imagine being open to >>>>>>> more >>>>>>> freely backporting a few new features for 2.x users, whereas usually >>>>>>> it's >>>>>>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance >>>>>>> branch but there's something too big for a 'normal' maintenance release, >>>>>>> and I think the whole question turns on what that is. >>>>>>> >>>>>>> If it's things like JDK 11 support, I think that is unfortunately >>>>>>> fairly 'breaking' because of dependency updates. But maybe that's not >>>>>>> it. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau <hol...@pigscanfly.ca> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Folks, >>>>>>>> >>>>>>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5 >>>>>>>> release. Spark 3 brings a number of important changes, and by its >>>>>>>> nature is >>>>>>>> not backward compatible. I think we'd all like to have as smooth an >>>>>>>> upgrade >>>>>>>> experience to Spark 3 as possible, and I believe that having a Spark 2 >>>>>>>> release some of the new functionality while continuing to support the >>>>>>>> older >>>>>>>> APIs and current Scala version would make the upgrade path smoother. >>>>>>>> >>>>>>>> This pattern is not uncommon in other Hadoop ecosystem projects, >>>>>>>> like Hadoop itself and HBase. >>>>>>>> >>>>>>>> I know that Ryan Blue has indicated he is already going to be >>>>>>>> maintaining something like that internally at Netflix, and we'll be >>>>>>>> doing >>>>>>>> the same thing at Apple. It seems like having a transitional release >>>>>>>> could >>>>>>>> benefit the community with easy migrations and help avoid duplicated >>>>>>>> work. >>>>>>>> >>>>>>>> I want to be clear I'm volunteering to do the work of managing a >>>>>>>> 2.5 release, so hopefully, this wouldn't create any substantial >>>>>>>> burdens on >>>>>>>> the community. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Holden >>>>>>>> -- >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> >>>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> >>>>> >>>>> >>>>> -- >>>>> <https://databricks.com/sparkaisummit/north-america> >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau