Hi all, I wanted to offer a slightly different perspective regarding the project's long-term health. I see a compelling argument for prioritizing efforts that address *codebase simplification* before investing heavily in a major language upgrade, especially given the Spark Connect option for users and developers.
My main point centers on the value proposition of this significant change: 1. *Spark Connect as an Alternative:* For many users, the primary benefits of a major language upgrade—such as access to new features and APIs—are now substantially covered by *Spark Connect*. This feature already provides a powerful, similar experience across many use cases, which could suggest that the urgency for a full internal transition is not that big. 2. *Impact on Long-Term Maintainability:* My primary concern is the cumulative impact of these changes on the project’s technical debt. As the codebase currently stands, there are existing complexities (e.g., the parallel support for Datasource V1 and V2, the mix of Java and Scala APIs - and until not long ago - the support of multiple Scala versions) that already challenge *readability and maintenance*. 3. *Risk of Further Fragmentation:* Layering on support for a new major language version (Scala 3), which necessarily has differences from previous versions, risks further complicating the build matrix and internal logic and project structure. I worry this could make the project even more challenging to onboard new contributors and manage future patches. I propose we launch a focused initiative to *tighten and consolidate* the existing codebase. This would involve: - *API Simplification:* Creating a roadmap for the eventual deprecation and removal of older systems like Datasource V1. - *Consolidation:* Reducing the remaining areas of language or version fragmentation to make the existing code more straightforward. - *Project high level design doc: *a few pages doc or a video, that explains the general flow and some of the most important classes, for new contributors to have a starting point. By investing in internal cleanup and simplification first, we ensure that any *future* feature or bug fix will be significantly less disruptive and more cost-effective, while new Languages support will be handled in a different repo, based on Spark Connect - so it won’t impact the core project. Any thoughts about that? Best regards, Nimrod On Wed, Nov 5, 2025 at 9:55 AM Norbert Schultz < [email protected]> wrote: > Hi Tanveer, > > The approach with Spark Connect from Dangjoon Hyun seems like a good > start, if we want to run Scala 3 Applications with a Spark backend > > However I would also like to see a Scala 3 Build of Spark itself, as it > would migrating existing applications easier. > > For that, it’s maybe a good Idea to just start with a small fork to gather > more information: > > - Update https://github.com/apache/spark/pull/50474 > - There doesn’t seem to be too much Scala Macros in the Codebase. Also > there is no Shapeless. Good. > - UDFs, DataSet, Encoders, ScalaReflection etc. are using Typetag to > encode Decoders. This should be exchanged into some Spark-owned Typeclass, > which can then describe Scala 2/Scala 3 specific ways. The Scala 2 Code can > then still rely on TypeTags > - Enabling Scala 3.3.x on the code and see what breaks. At least Scala > with SBT supports Scala-Version specific Code paths (e.g. src/main/scala-3, > Scala-2). I am sure, Maven can do this too. Scala-2-Specific Code goes to > scala-2. Stubs should make it possible, to compile in Scala-3. > - Implementing the stubs for Scala 3 and see how it goes. Typetags should > possible be replaceable by a combination of ClassTag and Mirror.ProductOf > (guessing) > > This could also be possible in a sub-project-wise fashion. > > The Scala 3 Code style should be as similar as the existing Scala 2 Style, > in order to not make it more complicated, so Brace-Style and no unnecessary > new futures. > > Note: I am not deep in the Spark source code. > > Kind Regards, > Norbert > > > > Am 04.11.2025 um 12:10 schrieb Tanveer Zia <[email protected]>: > > Hi everyone, > > I’m Tanveer from Scala Teams. We’re interested in contributing to the > Scala 3 migration of Apache Spark, as referenced in SPARK-54150 > <https://issues.apache.org/jira/browse/SPARK-54150>. > > Could you please share the current status or any existing roadmap for this > migration? We’d also appreciate guidance on how external contributors can > best get involved or coordinate with the core team on next steps. > > Best regards, > *Tanveer Zia* > Scala Teams > > > > Reactive Core GmbH | Paul-Lincke-Ufer 8b | 10999 Berlin > Fon: +49 30 9832 4666 | Web: www.reactivecore.de > Handelsregister: Amtsgericht Charlottenburg HRB 156696 B > Sitz: Berlin | Geschäftsführer: Norbert Schultz > >
