Hi everyone

The current spark2.4, spark3.0 have the following unaligned runtime
artifact names:

# Spark 2.4
iceberg-spark-runtime-0.13.1.jar
# Spark 3.0
iceberg-spark3-runtime-0.13.1.jar
# Spark 3.1
iceberg-spark-runtime-3.1_2.12-0.13.1.jar
# Spark 3.2
iceberg-spark-runtime-3.2_2.12-0.13.1.jar

>From the spark 3.1 and spark 3.2's runtime artifact names, we can easily
recognize:
1. What's the spark major version that the runtime jar is attached to
2. What's the spark scala version that the runtime jar is compiled with

But for spark 3.0 and spark 2.4,  it's not easy to understand what's the
above information.  I think we kept those legacy names because they were
introduced in older iceberg releases and we wanted to avoid changing the
modules that users depend on and opted not to rename, but they are indeed
causing confusion for the new community users.

In general,   we have two options:

Option#1:  keep the current artifact names, that mean spark 2.4 & spark 3.0
will always use the iceberg-spark-runtime-<iceberg-version>.jar and
iceberg-spark3-runtime-<iceberg-version>.jar until them get retired in the
apache iceberg official repo.
Option#2:  Change the spark2.4 & spark3.0's artifact names to the generic
name format:
iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
 It makes sharing all the consistent name format between all the spark
versions.

Personally, I'd prefer option#2 because that looks more friendly for new
community users (although it will require the old users to change their
pom.xml to the new version).

What is your preference ?

Reference:
1.  Created a PR to change the artifact names and we had few discussions
there. https://github.com/apache/iceberg/pull/4158
2.  https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155

Reply via email to