Hi, All and Matei (as the Chair of Apache Spark PMC).

Sorry for a long email, I want to share two topics and corresponding action
items.
You can go to "Section 3: Action Items" directly for the conclusion.


### 1. ASF Policy Violation ###

ASF has a rule for "MAY I CALL MY MODIFIED CODE 'APACHE'?"

    https://www.apache.org/foundation/license-faq.html#Name-changes

For example, when we call `Apache Spark 3.4.0`, it's supposed to be the
same with one of our official distributions.

    https://downloads.apache.org/spark/spark-3.4.0/

Specifically, in terms of the Scala version, we believe it should have
Scala 2.12.17 because of 'SPARK-40436 Upgrade Scala to 2.12.17'.

There is a company claiming something non-Apache like "Apache Spark 3.4.0
minus SPARK-40436" with the name "Apache Spark 3.4.0."

    - The company website shows "X.Y (includes Apache Spark 3.4.0, Scala
2.12)"
    - The runtime logs "23/06/05 04:23:27 INFO SparkContext: Running Spark
version 3.4.0"
    - UI shows Apache Spark logo and `3.4.0`.
    - However, Scala Version is '2.12.15'

[image: Screenshot 2023-06-04 at 9.37.16 PM.png][image: Screenshot
2023-06-04 at 10.14.45 PM.png]

Lastly, this is not a single instance. For example, the same company also
claims "Apache Spark 3.3.2" with a mismatched Scala version.


### 2. Scala Issues ###

In addition to (1), although we proceeded with good intentions and great
care
including dev mailing list discussion, there are several concerning areas
which
need more attention and our love.

a) Scala Spark users will experience UX inconvenience from Spark 3.5.

    SPARK-42493 Make Python the first tab for code examples

    For the record, we discussed it here.
    - https://lists.apache.org/thread/1p8s09ysrh4jqsfd47qdtrl7rm4rrs05
      "[DISCUSS] Show Python code examples first in Spark documentation"

b) Scala version upgrade is blocked by the Ammonite library dev cycle
currently.

    Although we discussed it here and it had good intentions,
    the current master branch cannot use the latest Scala.

    - https://lists.apache.org/thread/4nk5ddtmlobdt8g3z8xbqjclzkhlsdfk
    "Ammonite as REPL for Spark Connect"
     SPARK-42884 Add Ammonite REPL integration

    Specifically, the following are blocked and I'm monitoring the Ammonite
repository.
    - SPARK-40497 Upgrade Scala to 2.13.11
    - SPARK-43832 Upgrade Scala to 2.12.18
    - According to https://github.com/com-lihaoyi/Ammonite/issues ,
      Scala 3.3.0 LTS support also looks infeasible.

    Although we may be able to wait for a while, there are two fundamental
solutions
    to unblock this situation in a long-term maintenance perspective.
    - Replace it with a Scala-shell based implementation
    - Move `connector/connect/client/jvm/pom.xml` outside from Spark repo.
       Maybe, we can put it into the new repo like Rust and Go client.

c) Scala 2.13 and above needs Apache Spark 4.0.

    In "Apache Spark 3.5.0 Expectations?" and "Apache Spark 4.0 Timeframe?"
threads,
    we discussed Spark 3.5.0 scope and decided to revert
    "SPARK-43836 Make Scala 2.13 as default in Spark 3.5".
    Apache Spark 4.0.0 is the only way to support Scala 2.13 or higher.

    - https://lists.apache.org/thread/3x6dh17bmy20n3frtt3crgxjydnxh2o0
("Apache Spark 3.5.0 Expectations?")
    - https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
("Apache Spark 4.0 Timeframe?")

     A candidate(or mentioned) timeframe was "Spark 4.0.0: 2024.06" and
Scala 3.3.0 LTS.
     - https://scala-lang.org/blog/2023/05/30/scala-3.3.0-released.html

d) Java 21 LTS is Apache Spark 3.5.0's stretched goal

    SPARK-43831 Build and Run Spark on Java 21

    However, this needs SPARK-40497 (Scala 2.13.11) and SPARK-43832 (Scala
2.12.18)
    which are blocked by Ammonite library as mentioned in (b)


### 3. Action Items ###

To provide a clarity to the Apache Spark Scala community,

- We should communicate and help the company to fix the misleading messages
and
  remove Scala-version segmentation situations per Spark version.

- Apache Spark PMC should include this incident report and the result
  in the next Apache Spark Quarterly Report (August).

- I will start a vote for Apache Spark 4.0.0 timeframe next week after
receiving more feedback.
  Since 4.0.0 is not limited to the Scala issues, we will vote on the
timeline only.

- Lastly, we need to re-evaluate the risk of  `Ammonite` library before
Apache Spark 3.5.0 release.
  If it blocks Scala upgrade and Java 21 support, we had better avoid it at
all cost.


WDTY?

Thanks,
Dongjoon.

Reply via email to