+1 for a 2.x release with a DSv2 API that matches 3.0. There are a lot of big differences between the API in 2.4 and 3.0, and I think a release to help migrate would be beneficial to organizations like ours that will be supporting 2.x and 3.0 in parallel for quite a while. Migration to Spark 3 is going to take time as people build confidence in it. I don't think that can be avoided by leaving a larger feature gap between 2.x and 3.0.
On Fri, Jun 12, 2020 at 5:53 PM Xiao Li <lix...@databricks.com> wrote: > Based on my understanding, DSV2 is not stable yet. It still misses various > features. Even our built-in file sources are still unable to fully migrate > to DSV2. We plan to enhance it in the next few releases to close the gap. > > Also, the changes on DSV2 in Spark 3.0 did not break any existing > application. We should encourage more users to try Spark 3 and increase the > adoption of Spark 3.x. > > Xiao > > On Fri, Jun 12, 2020 at 5:36 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> So I one of the things which we’re planning on backporting internally is >> DSv2, which I think being available in a community release in a 2 branch >> would be more broadly useful. Anything else on top of that would be on a >> case by case basis for if they make an easier upgrade path to 3. >> >> If we’re worried about people using 2.5 as a long term home we could >> always mark it with “-transitional” or something similar? >> >> On Fri, Jun 12, 2020 at 4:33 PM Sean Owen <sro...@gmail.com> wrote: >> >>> What is the functionality that would go into a 2.5.0 release, that can't >>> be in a 2.4.7 release? I think that's the key question. 2.4.x is the 2.x >>> maintenance branch, and I personally could imagine being open to more >>> freely backporting a few new features for 2.x users, whereas usually it's >>> only bug fixes. Making 2.5.0 implies that 2.5.x is the 2.x maintenance >>> branch but there's something too big for a 'normal' maintenance release, >>> and I think the whole question turns on what that is. >>> >>> If it's things like JDK 11 support, I think that is unfortunately fairly >>> 'breaking' because of dependency updates. But maybe that's not it. >>> >>> >>> On Fri, Jun 12, 2020 at 4:38 PM Holden Karau <hol...@pigscanfly.ca> >>> wrote: >>> >>>> Hi Folks, >>>> >>>> As we're getting closer to Spark 3 I'd like to revisit a Spark 2.5 >>>> release. Spark 3 brings a number of important changes, and by its nature is >>>> not backward compatible. I think we'd all like to have as smooth an upgrade >>>> experience to Spark 3 as possible, and I believe that having a Spark 2 >>>> release some of the new functionality while continuing to support the older >>>> APIs and current Scala version would make the upgrade path smoother. >>>> >>>> This pattern is not uncommon in other Hadoop ecosystem projects, like >>>> Hadoop itself and HBase. >>>> >>>> I know that Ryan Blue has indicated he is already going to be >>>> maintaining something like that internally at Netflix, and we'll be doing >>>> the same thing at Apple. It seems like having a transitional release could >>>> benefit the community with easy migrations and help avoid duplicated work. >>>> >>>> I want to be clear I'm volunteering to do the work of managing a 2.5 >>>> release, so hopefully, this wouldn't create any substantial burdens on the >>>> community. >>>> >>>> Cheers, >>>> >>>> Holden >>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> > > > -- > <https://databricks.com/sparkaisummit/north-america> > -- Ryan Blue Software Engineer Netflix