Hi, All and Matei (as the Chair of Apache Spark PMC). Sorry for a long email, I want to share two topics and corresponding action items. You can go to "Section 3: Action Items" directly for the conclusion.
### 1. ASF Policy Violation ### ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?" https://www.apache.org/foundation/license-faq.html#Name-changes For example, when we call `Apache Spark 3.4.0`, it's supposed to be the same with one of our official distributions. https://downloads.apache.org/spark/spark-3.4.0/ Specifically, in terms of the Scala version, we believe it should have Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'. There is a company claiming something non-Apache like "Apache Spark 3.4.0 minus SPARK-40436" with the name "Apache Spark 3.4.0." - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala 2.12)" - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark version 3.4.0" - UI shows Apache Spark logo and `3.4.0`. - However, Scala Version is '2.12.15' [image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot 2023-06-04 at 10.14.45 PM.png] Lastly, this is not a single instance. For example, the same company also claims "Apache Spark 3.3.2" with a mismatched Scala version. ### 2. Scala Issues ### In addition to (1), although we proceeded with good intentions and great care including dev mailing list discussion, there are several concerning areas which need more attention and our love. a) Scala Spark users will experience UX inconvenience from Spark 3.5. SPARK-42493 Make Python the first tab for code examples For the record, we discussed it here. - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05 "[DISCUSS] Show Python code examples first in Spark documentation" b) Scala version upgrade is blocked by the Ammonite library dev cycle currently. Although we discussed it here and it had good intentions, the current master branch cannot use the latest Scala. - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk "Ammonite as REPL for Spark Connect" SPARK-42884 Add Ammonite REPL integration Specifically, the following are blocked and I'm monitoring the Ammonite repository. - SPARK-40497 Upgrade Scala to 2.13.11 - SPARK-43832 Upgrade Scala to 2.12.18 - According to https://github.com/com-lihaoyi/Ammonite/issues , Scala 3.3.0 LTS support also looks infeasible. Although we may be able to wait for a while, there are two fundamental solutions to unblock this situation in a long-term maintenance perspective. - Replace it with a Scala-shell based implementation - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo. Maybe, we can put it into the new repo like Rust and Go client. c) Scala 2.13 and above needs Apache Spark 4.0. In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0 Timeframe?" threads, we discussed Spark 3.5.0 scope and decided to revert "SPARK-43836 Make Scala 2.13 as default in Spark 3.5". Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher. - https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0 ("Apache Spark 3.5.0 Expectations?") - https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 ("Apache Spark 4.0 Timeframe?") A candidate(or mentioned) timeframe was "Spark 4.0.0: 2024.06" and Scala 3.3.0 LTS. - https://scala-lang.org/blog/2023/05/30/scala-3.3.0-released.html d) Java 21 LTS is Apache Spark 3.5.0's stretched goal SPARK-43831 Build and Run Spark on Java 21 However, this needs SPARK-40497 (Scala 2.13.11) and SPARK-43832 (Scala 2.12.18) which are blocked by Ammonite library as mentioned in (b) ### 3. Action Items ### To provide a clarity to the Apache Spark Scala community, - We should communicate and help the company to fix the misleading messages and remove Scala-version segmentation situations per Spark version. - Apache Spark PMC should include this incident report and the result in the next Apache Spark Quarterly Report (August). - I will start a vote for Apache Spark 4.0.0 timeframe next week after receiving more feedback. Since 4.0.0 is not limited to the Scala issues, we will vote on the timeline only. - Lastly, we need to re-evaluate the risk of `Ammonite` library before Apache Spark 3.5.0 release. If it blocks Scala upgrade and Java 21 support, we had better avoid it at all cost. WDTY? Thanks, Dongjoon.