This is an automated email from the ASF dual-hosted git repository.
ajantha pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris-tools.git
The following commit(s) were added to refs/heads/main by this push:
new f4b9a91 Catalog Migrator: Remove AWS SDK dependencies for runtime
(#133)
f4b9a91 is described below
commit f4b9a913420095ae9d5d026c7ec55bfec8d3f5d6
Author: Ajantha Bhat <[email protected]>
AuthorDate: Wed Jan 14 15:54:41 2026 +0530
Catalog Migrator: Remove AWS SDK dependencies for runtime (#133)
Runtime jar was 650 MB because of this. Now it is around 118 MB (still
packs hadoop-aws).
Initially wanted to have it has standalone jar, but we were only depending
on AWS SDK. We need other GCP or AZURE dependencies if we want to be standalone
jar for all systems. Thats will bloat up the runtime jar size. Hence, excluded
the dependencies and user need to provide them in the classpath based on the
storage type (similar to runtime jars of Iceberg)
---
iceberg-catalog-migrator/cli/BUNDLE-LICENSE | 7 ----
iceberg-catalog-migrator/cli/BUNDLE-NOTICE | 29 -------------
iceberg-catalog-migrator/cli/build.gradle.kts | 12 +-----
.../docs/object-store-access-configuration.md | 49 ++++++++++++++++++++--
iceberg-catalog-migrator/gradle/libs.versions.toml | 10 -----
5 files changed, 47 insertions(+), 60 deletions(-)
diff --git a/iceberg-catalog-migrator/cli/BUNDLE-LICENSE
b/iceberg-catalog-migrator/cli/BUNDLE-LICENSE
index d7b39e5..47a8767 100644
--- a/iceberg-catalog-migrator/cli/BUNDLE-LICENSE
+++ b/iceberg-catalog-migrator/cli/BUNDLE-LICENSE
@@ -1244,13 +1244,6 @@ License: Apache License, Version 2.0 -
http://www.apache.org/licenses/LICENSE-2.
--------------------------------------------------------------------------------
-This artifact bundles Amazon AWS SDK.
-
-Project URL: https://aws.amazon.com/sdkforjava
-License: Apache License, Version 2.0 -
http://www.apache.org/licenses/LICENSE-2.0.txt
-
---------------------------------------------------------------------------------
-
This artifact bundles Stax API.
Project URL: http://stax.codehaus.org/
diff --git a/iceberg-catalog-migrator/cli/BUNDLE-NOTICE
b/iceberg-catalog-migrator/cli/BUNDLE-NOTICE
index 427a17d..556a7d1 100644
--- a/iceberg-catalog-migrator/cli/BUNDLE-NOTICE
+++ b/iceberg-catalog-migrator/cli/BUNDLE-NOTICE
@@ -133,33 +133,4 @@ This artifact bundles Project Nessie with the following in
its NOTICE:
| Nessie
| Copyright 2015-2025 Dremio Corporation
--------------------------------------------------------------------------
-
-This artifact bundles Amazon AWS SDK with the following in its NOTICE:
-| AWS SDK for Java 2.0
-| Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
-|
-| This product includes software developed by
-| Amazon Technologies, Inc (http://www.amazon.com/).
-|
-| **********************
-| THIRD PARTY COMPONENTS
-| **********************
-| This software includes third party software subject to the following
copyrights:
-| - XML parsing and utility functions from JetS3t - Copyright 2006-2009 James
Murty.
-| - PKCS#1 PEM encoded private key parsing and utility functions from
oauth.googlecode.com - Copyright 1998-2010 AOL Inc.
-| - Apache Commons Lang - https://github.com/apache/commons-lang
-| - Netty Reactive Streams -
https://github.com/playframework/netty-reactive-streams
-| - Jackson-core - https://github.com/FasterXML/jackson-core
-| - Jackson-dataformat-cbor -
https://github.com/FasterXML/jackson-dataformats-binary
-|
-| The licenses for these third party components are included in LICENSE.txt
-|
-| - For Apache Commons Lang see also this required NOTICE:
-| Apache Commons Lang
-| Copyright 2001-2020 The Apache Software Foundation
-|
-| This product includes software developed at
-| The Apache Software Foundation (https://www.apache.org/).
-
-------------------------------------------------------------------------
\ No newline at end of file
diff --git a/iceberg-catalog-migrator/cli/build.gradle.kts
b/iceberg-catalog-migrator/cli/build.gradle.kts
index d9d4bf5..8330a77 100644
--- a/iceberg-catalog-migrator/cli/build.gradle.kts
+++ b/iceberg-catalog-migrator/cli/build.gradle.kts
@@ -47,17 +47,7 @@ dependencies {
implementation("org.apache.iceberg:iceberg-hive-metastore")
implementation("org.apache.iceberg:iceberg-nessie")
implementation("org.apache.iceberg:iceberg-dell")
- implementation(libs.hadoop.aws) { exclude("com.amazonaws",
"aws-java-sdk-bundle") }
- // AWS dependencies based on
https://iceberg.apache.org/docs/latest/aws/#enabling-aws-integration
- runtimeOnly(libs.aws.sdk.apache.client)
- runtimeOnly(libs.aws.sdk.auth)
- runtimeOnly(libs.aws.sdk.glue)
- runtimeOnly(libs.aws.sdk.s3)
- runtimeOnly(libs.aws.sdk.dynamo)
- runtimeOnly(libs.aws.sdk.kms)
- runtimeOnly(libs.aws.sdk.lakeformation)
- runtimeOnly(libs.aws.sdk.sts)
- runtimeOnly(libs.aws.sdk.url.connection.client)
+ implementation(libs.hadoop.aws) { exclude(group = "software.amazon.awssdk") }
// needed for Hive catalog
runtimeOnly("org.apache.hive:hive-metastore:${libs.versions.hive.get()}") {
diff --git a/iceberg-catalog-migrator/docs/object-store-access-configuration.md
b/iceberg-catalog-migrator/docs/object-store-access-configuration.md
index 3f984e7..5bc3fe8 100644
--- a/iceberg-catalog-migrator/docs/object-store-access-configuration.md
+++ b/iceberg-catalog-migrator/docs/object-store-access-configuration.md
@@ -21,16 +21,59 @@
This document provides a guide on how to configure access to object stores for
the Iceberg Catalog Migrator.
+## Required Dependencies
+
+The Iceberg Catalog Migrator CLI jar does not include cloud provider
dependencies to keep the distribution size small.
+Users must supplement the appropriate Iceberg object store bundle jar based on
the object store being used.
+
+Download the required bundle jar from [Maven
Central](https://repo1.maven.org/maven2/org/apache/iceberg/)
+
## AWS S3
-For AWS, you can use the following environment variables:
+
+### Required Dependencies
+Users must include the Iceberg AWS bundle jar (can be downloaded from
[here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle))
in the classpath:
+```shell
+java -cp
iceberg-catalog-migrator-cli-0.1.0-SNAPSHOT.jar:iceberg-aws-bundle-x.x.x.jar \
+ org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register
\
+ [your-options]
+```
+
+For more information on AWS integration, refer to the [Iceberg AWS
documentation](https://iceberg.apache.org/docs/nightly/aws/#enabling-aws-integration).
+
+### Environment Variables
+For AWS, use the following environment variables:
```shell
export AWS_ACCESS_KEY_ID=xxxxxxx
export AWS_SECRET_ACCESS_KEY=xxxxxxx
export AWS_S3_ENDPOINT=xxxxxxx
```
-## ADLS
-For ADLS, you can use the following environment variables:
+## Azure Data Lake Storage (ADLS)
+
+### Required Dependencies
+Users must include the Iceberg Azure bundle jar (can be downloaded from
[here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-azure-bundle))
in the classpath:
+```shell
+java -cp iceberg-catalog-migrator-cli-0.1.0.jar:iceberg-azure-bundle-x.x.x.jar
\
+ org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register
\
+ [your-options]
+```
+
+### Environment Variables
+For ADLS, use the following environment variables:
```shell
export AZURE_SAS_TOKEN=xxxxxxx
```
+
+## Google Cloud Storage (GCS)
+
+### Required Dependencies
+Users must include the Iceberg GCP bundle jar (can be downloaded from
[here](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-gcp-bundle))
in the classpath:
+```shell
+java -cp iceberg-catalog-migrator-cli-0.1.0.jar:iceberg-gcp-bundle-x.x.x.jar \
+ org.apache.polaris.iceberg.catalog.migrator.cli.CatalogMigrationCLI register
\
+ [your-options]
+```
+
+## Notes
+- Replace `x.x.x` with the Iceberg version matching the release version of the
migrator tool.
+- Multiple bundle jars can be included if users need to access multiple cloud
providers.
diff --git a/iceberg-catalog-migrator/gradle/libs.versions.toml
b/iceberg-catalog-migrator/gradle/libs.versions.toml
index 98b2b1f..e18f812 100644
--- a/iceberg-catalog-migrator/gradle/libs.versions.toml
+++ b/iceberg-catalog-migrator/gradle/libs.versions.toml
@@ -19,7 +19,6 @@
[versions]
assertj = "3.27.3"
-aws = "2.33.0" # this is in mapping with iceberg repo.
checkstyle = "10.21.3"
errorprone = "2.36.0"
errorproneSlf4j = "0.1.28"
@@ -43,15 +42,6 @@ testcontainers = "1.21.3"
[libraries]
assertj = { module = "org.assertj:assertj-core", version.ref = "assertj" }
-aws-sdk-apache-client = { module = "software.amazon.awssdk:apache-client",
version.ref = "aws" }
-aws-sdk-auth = { module = "software.amazon.awssdk:auth", version.ref = "aws" }
-aws-sdk-dynamo = { module = "software.amazon.awssdk:dynamodb", version.ref =
"aws" }
-aws-sdk-glue = { module = "software.amazon.awssdk:glue", version.ref = "aws" }
-aws-sdk-kms = { module = "software.amazon.awssdk:kms", version.ref = "aws" }
-aws-sdk-lakeformation = { module = "software.amazon.awssdk:lakeformation",
version.ref = "aws" }
-aws-sdk-sts = { module = "software.amazon.awssdk:sts", version.ref = "aws" }
-aws-sdk-s3 = { module = "software.amazon.awssdk:s3", version.ref = "aws" }
-aws-sdk-url-connection-client = { module =
"software.amazon.awssdk:url-connection-client", version.ref = "aws" }
checkstyle = { module = "com.puppycrawl.tools:checkstyle", version.ref =
"checkstyle" }
errorprone-annotations = { module =
"com.google.errorprone:error_prone_annotations", version.ref = "errorprone" }
errorprone-core = { module = "com.google.errorprone:error_prone_core",
version.ref = "errorprone" }