Hi,

I guess that the pending Spark 3.3 support is a big enough feature to
warrant a new Sedona release. It makes no sense to remove Spark 2.4 and
Scala 2.11 before the release.

After Sedona-next is released I think that Spark 2.4 can safely be removed.

Long term, but i think that's another discussion, there are a lot of
benefits to moving shared code between Sedona-Spark and Sedona-Flink to a
common java-only module (sedona-common?). That would include partitioning
code and probably most ST_x/RS_x functions.

That would give Sedona-Flink first class, scala-free, support. It would
also open up Sedona to other jvm data tools regardless of whether they are
written in java, scala, kotlin, clojure or any other jvm language. Possibly
Sedona-Kafka, Sedona-Hive etc. That would make Scala-version support a
Sedona-Spark issue only and not a general Sedona issue.

Br,
Martin

On 2022/06/19 06:10:31 Jia Yu wrote:
> Dear all,
>
> I am proposing to drop the support of Spark 2.4 and Scala 2.11 in the next
> Sedona release. The version number will be 1.3.0 if we drop this support,
> otherwise it will be 1.2.1.
>
> Here is the status of Spark 2.4 and Sedona for Spark 2.4
> 1. Spark community has announced Spark 2.4 EOL on March 03 2021:
> https://www.mail-archive.com/dev@spark.apache.org/msg27476.html
> 2. Spark 3.0 was released on 06-16-2020.
> 3. Spark 3.3.0 was released a few days ago. And starting from Spark 3.2,
> Spark releases binaries for both Scala 2.12 and 2.13.
> 4. Only a few Sedona users are using Spark 2.4. According to the
statistics
> of Maven Central (Scala/Java API only), only around 1K out of 100K
> downloads are using Sedona for Spark 2.4. (core-2.4_2.11, core-2.4_2.12,
> python-adapter-2.4_2.11, python-adapter-2.4_2.12)
>
> Benefits of dropping the support:
> 1. Reduce the complexity of maintaining the source code for different
Spark
> versions. Currently, several files have two versions for Spark 2.4 and
3.x,
> controlled by "anchor" keywords. I wrote a Python script to pre-process
the
> source code all the time:
>
https://github.com/apache/incubator-sedona/blob/master/spark-version-converter.py
> 2. Reduce the overhead of releasing binary packages. Currently, the main
> POM.xml is quite complex in order to compile against different Spark
> versions. Therefore, we weren't able to release Sedona for Scala 2.13.
>
> Plan of Sedona for Spark 3.X
> 1. Sedona source code already supports Scala 2.13 but no Sedona binary
> release. We will release Sedona for both Scala 2.12 and 2.13, but no Scala
> 2.11.
> 2. Sedona already releases binaries for Spark 3.0, 3.1, 3.2
> 3. The two latest PRs of Sedona are adding the support for Spark 3.3.
> https://github.com/apache/incubator-sedona/pull/636
> https://github.com/apache/incubator-sedona/pull/635
>
> What do you think of this proposal? If you don't like this, what is the
> best time to drop the support of Spark 2.4 and Scala 2.11?
>
> I will let this discussion open for at least 3 days. If no objection, I
> will remove Spark 2.4 from POM.xml and GitHub Actions, but leave the Spark
> 2.4 support in the source code. So whoever wants to use Sedona on Spark
2.4
> can still compile the source code by themselves.
>
> Thanks,
> Jia
>

Reply via email to