Hi Ryan,

I agree with this approach. It is perfectly aligned with the points I
mentioned a while ago.

Regards,
JB

On Thu, Apr 30, 2026 at 7:49 PM Ryan Blue <[email protected]> wrote:

> Hi everyone,
>
> I have a quick update on LICENSE issues that are currently blocking 1.11
> and 1.10.2. Also, sorry if you got this twice, but it looks like it didn't
> go through the first time.
>
> TL;DR: I think we should:
>
>    - Hold off on adding Kafka Connect to the release process
>    - Remove the iceberg-open-api-test-fixtures-runtime Jar from releases
>
> The background is that over the last few weeks, we found two fairly large
> leaks that added transitive dependencies into Iceberg runtime Jars (fixed
> by #15655 <https://github.com/apache/iceberg/pull/15655> and #15858
> <https://github.com/apache/iceberg/pull/15858>). As a result, Russell
> added a new way to track and validate the dependencies included in our
> published artifacts. To make sure the new checks are correct, I’ve been
> going through to validate the LICENSE/NOTICE files against the dependency
> list. Unfortunately, there are more problems.
>
> The first problem is with our Kafka Connect distribution. There are two
> zip distributions, a Hive and a non-Hive version. Robin has been working on
> getting these published as part of our release process in #15212
> <https://github.com/apache/iceberg/pull/15212>. The non-Hive distribution
> is very large and has some dependencies that may not need to be there, like
> Apache Commons Jars that aren’t used in Iceberg (and would be provided by
> KC if needed?). #16147 <https://github.com/apache/iceberg/pull/16147> is
> a draft with some of the non-Hive changes. The Hive distribution has about
> 100 more Jars than non-Hive, and includes many dependencies that are almost
> certainly unnecessary, like 3 hadoop-mapreduce-* Jars. *My recommendation
> is to hold off on making Kafka Connect part of releases until the license
> issues are solved*.
>
> Another issue is the open-api module. We added this to the Java build to
> verify the REST catalog spec, but then added tests and fixtures for
> validating REST implementations. #11279
> <https://github.com/apache/iceberg/pull/11279> added a runtime Jar for to
> run a test service, but most PMC members I’ve talked to about it didn’t
> know that we have been publishing it — and have been since 1.7. This
> runtime Jar indiscriminately bundles far more libraries than it needs, like
> the cloud provider libs, Hadoop common, JUnit, Jetty, and others. The Jar
> is 200+ MB
> <https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-open-api/1.10.1/>
> . *My recommendation is to remove this Jar from publication to unblock
> releases*.
>
> As a general rule, when we are considering adding a new runtime
> distribution to the project, we need to check that it is something we need
> to do (vs an easy alternative), and if it is, then minimize the
> dependencies included to only those required to run it. Once that’s done,
> we need to document the dependencies in LICENSE and NOTICE and, as of
> #15855 <https://github.com/apache/iceberg/pull/15855>, ensure that the
> bundled dependencies are tracked in a runtime-deps.txt file.
>
> I think the priority right now is to unblock the 1.11 and 1.10.2 releases.
> We can do that by not releasing these artifacts. After that, I think we
> need to verify for all of these that they are needed, have minimal included
> dependencies, and then document those dependencies. For example, do we need
> a Kafka Connect Hive distribution or is the REST catalog version enough?
> Does everyone agree that this is the right path forward?
>
> Thanks,
>
> Ryan
>

Reply via email to