dejii opened a new issue, #37614:
URL: https://github.com/apache/beam/issues/37614
### What would you like to happen?
The Apache Beam Iceberg connector currently only supports Google Cloud
Storage (GCS) out of the box. The expansion service JAR includes `iceberg-gcp`,
but does not bundle `iceberg-aws`, which is required for AWS S3 and
S3-compatible storage backends (e.g., MinIO, Supabase Storage, etc.).
Attempting to write to an S3-compatible destination using the current
expansion service results in the following error when running on Dataflow:
```
Error message from worker: org.apache.beam.sdk.util.UserCodeException:
java.lang.IllegalArgumentException: Cannot initialize FileIO implementation
org.apache.iceberg.aws.s3.S3FileIO: Cannot find constructor for interface
org.apache.iceberg.io.FileIO
Missing org.apache.iceberg.aws.s3.S3FileIO
[java.lang.ClassNotFoundException: org.apache.iceberg.aws.s3.S3FileIO]
```
Current workarounds:
- Building and deploying a custom expansion service JAR that includes
iceberg-aws
- Using IcebergIO directly (which is generally discouraged in favor of using
Managed IO)
### Proposal
Bundle iceberg-aws with the official Iceberg expansion service JAR to enable
native S3 and S3-compatible storage support.
This would allow writing to S3-compatible destinations using a REST-based
catalog configuration such as:
```java
ImmutableMap<String, String> catalogProperties = ImmutableMap.<String,
String>builder()
.put("type", "rest")
.put("uri", options.getCatalogUri())
.put("token", options.getCatalogToken())
.put("warehouse", options.getWarehouse())
.put("client.region", "us-east-1")
.put("s3.endpoint", options.getS3Endpoint())
.put("s3.access-key-id", options.getS3AccessKeyId())
.put("s3.secret-access-key", options.getS3SecretAccessKey())
.put("s3.path-style-access", "true")
.put("s3.remote-signing-enabled", "false")
.build();
```
### Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
### Issue Components
- [ ] Component: Python SDK
- [x] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]