Hi Ryan, I agree with this approach. It is perfectly aligned with the points I mentioned a while ago.
Regards, JB On Thu, Apr 30, 2026 at 7:49 PM Ryan Blue <[email protected]> wrote: > Hi everyone, > > I have a quick update on LICENSE issues that are currently blocking 1.11 > and 1.10.2. Also, sorry if you got this twice, but it looks like it didn't > go through the first time. > > TL;DR: I think we should: > > - Hold off on adding Kafka Connect to the release process > - Remove the iceberg-open-api-test-fixtures-runtime Jar from releases > > The background is that over the last few weeks, we found two fairly large > leaks that added transitive dependencies into Iceberg runtime Jars (fixed > by #15655 <https://github.com/apache/iceberg/pull/15655> and #15858 > <https://github.com/apache/iceberg/pull/15858>). As a result, Russell > added a new way to track and validate the dependencies included in our > published artifacts. To make sure the new checks are correct, I’ve been > going through to validate the LICENSE/NOTICE files against the dependency > list. Unfortunately, there are more problems. > > The first problem is with our Kafka Connect distribution. There are two > zip distributions, a Hive and a non-Hive version. Robin has been working on > getting these published as part of our release process in #15212 > <https://github.com/apache/iceberg/pull/15212>. The non-Hive distribution > is very large and has some dependencies that may not need to be there, like > Apache Commons Jars that aren’t used in Iceberg (and would be provided by > KC if needed?). #16147 <https://github.com/apache/iceberg/pull/16147> is > a draft with some of the non-Hive changes. The Hive distribution has about > 100 more Jars than non-Hive, and includes many dependencies that are almost > certainly unnecessary, like 3 hadoop-mapreduce-* Jars. *My recommendation > is to hold off on making Kafka Connect part of releases until the license > issues are solved*. > > Another issue is the open-api module. We added this to the Java build to > verify the REST catalog spec, but then added tests and fixtures for > validating REST implementations. #11279 > <https://github.com/apache/iceberg/pull/11279> added a runtime Jar for to > run a test service, but most PMC members I’ve talked to about it didn’t > know that we have been publishing it — and have been since 1.7. This > runtime Jar indiscriminately bundles far more libraries than it needs, like > the cloud provider libs, Hadoop common, JUnit, Jetty, and others. The Jar > is 200+ MB > <https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-open-api/1.10.1/> > . *My recommendation is to remove this Jar from publication to unblock > releases*. > > As a general rule, when we are considering adding a new runtime > distribution to the project, we need to check that it is something we need > to do (vs an easy alternative), and if it is, then minimize the > dependencies included to only those required to run it. Once that’s done, > we need to document the dependencies in LICENSE and NOTICE and, as of > #15855 <https://github.com/apache/iceberg/pull/15855>, ensure that the > bundled dependencies are tracked in a runtime-deps.txt file. > > I think the priority right now is to unblock the 1.11 and 1.10.2 releases. > We can do that by not releasing these artifacts. After that, I think we > need to verify for all of these that they are needed, have minimal included > dependencies, and then document those dependencies. For example, do we need > a Kafka Connect Hive distribution or is the REST catalog version enough? > Does everyone agree that this is the right path forward? > > Thanks, > > Ryan >
