Hi, I guess that the pending Spark 3.3 support is a big enough feature to warrant a new Sedona release. It makes no sense to remove Spark 2.4 and Scala 2.11 before the release.
After Sedona-next is released I think that Spark 2.4 can safely be removed. Long term, but i think that's another discussion, there are a lot of benefits to moving shared code between Sedona-Spark and Sedona-Flink to a common java-only module (sedona-common?). That would include partitioning code and probably most ST_x/RS_x functions. That would give Sedona-Flink first class, scala-free, support. It would also open up Sedona to other jvm data tools regardless of whether they are written in java, scala, kotlin, clojure or any other jvm language. Possibly Sedona-Kafka, Sedona-Hive etc. That would make Scala-version support a Sedona-Spark issue only and not a general Sedona issue. Br, Martin On 2022/06/19 06:10:31 Jia Yu wrote: > Dear all, > > I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the next > Sedona release. The version number will be 1.3.0 if we drop this support, > otherwise it will be 1.2.1. > > Here is the status of Spark 2.4 and Sedona for Spark 2.4 > 1. Spark community has announced Spark 2.4 EOL on March 03 2021: > https://www.mail-archive.com/dev@spark.apache.org/msg27476.html > 2. Spark 3.0 was released on 06-16-2020. > 3. Spark 3.3.0 was released a few days ago. And starting from Spark 3.2, > Spark releases binaries for both Scala 2.12 and 2.13. > 4. Only a few Sedona users are using Spark 2.4. According to the statistics > of Maven Central (Scala/Java API only), only around 1K out of 100K > downloads are using Sedona for Spark 2.4. (core-2.4_2.11, core-2.4_2.12, > python-adapter-2.4_2.11, python-adapter-2.4_2.12) > > Benefits of dropping the support: > 1. Reduce the complexity of maintaining the source code for different Spark > versions. Currently, several files have two versions for Spark 2.4 and 3.x, > controlled by "anchor" keywords. I wrote a Python script to pre-process the > source code all the time: > https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py > 2. Reduce the overhead of releasing binary packages. Currently, the main > POM.xml is quite complex in order to compile against different Spark > versions. Therefore, we weren't able to release Sedona for Scala 2.13. > > Plan of Sedona for Spark 3.X > 1. Sedona source code already supports Scala 2.13 but no Sedona binary > release. We will release Sedona for both Scala 2.12 and 2.13, but no Scala > 2.11. > 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2 > 3. The two latest PRs of Sedona are adding the support for Spark 3.3. > https://github.com/apache/incubator-sedona/pull/636 > https://github.com/apache/incubator-sedona/pull/635 > > What do you think of this proposal? If you don't like this, what is the > best time to drop the support of Spark 2.4 and Scala 2.11? > > I will let this discussion open for at least 3 days. If no objection, I > will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the Spark > 2.4 support in the source code. So whoever wants to use Sedona on Spark 2.4 > can still compile the source code by themselves. > > Thanks, > Jia >