dejii opened a new issue, #37614:
URL: https://github.com/apache/beam/issues/37614

   ### What would you like to happen?
   
   The Apache Beam Iceberg connector currently only supports Google Cloud 
Storage (GCS) out of the box. The expansion service JAR includes `iceberg-gcp`, 
but does not bundle `iceberg-aws`, which is required for AWS S3 and 
S3-compatible storage backends (e.g., MinIO, Supabase Storage, etc.).
   
   Attempting to write to an S3-compatible destination using the current 
expansion service results in the following error when running on Dataflow:
   ```
   Error message from worker: org.apache.beam.sdk.util.UserCodeException: 
java.lang.IllegalArgumentException: Cannot initialize FileIO implementation 
org.apache.iceberg.aws.s3.S3FileIO: Cannot find constructor for interface 
org.apache.iceberg.io.FileIO
   
   Missing org.apache.iceberg.aws.s3.S3FileIO 
[java.lang.ClassNotFoundException: org.apache.iceberg.aws.s3.S3FileIO]
   ```
   
   Current workarounds:
   - Building and deploying a custom expansion service JAR that includes 
iceberg-aws
   - Using IcebergIO directly (which is generally discouraged in favor of using 
Managed IO)
   
   ### Proposal
   Bundle iceberg-aws with the official Iceberg expansion service JAR to enable 
native S3 and S3-compatible storage support.
   
   This would allow writing to S3-compatible destinations using a REST-based 
catalog configuration such as:
   ```java
   ImmutableMap<String, String> catalogProperties = ImmutableMap.<String, 
String>builder()
       .put("type", "rest")
       .put("uri", options.getCatalogUri())
       .put("token", options.getCatalogToken())
       .put("warehouse", options.getWarehouse())
       .put("client.region", "us-east-1")
       .put("s3.endpoint", options.getS3Endpoint())
       .put("s3.access-key-id", options.getS3AccessKeyId())
       .put("s3.secret-access-key", options.getS3SecretAccessKey())
       .put("s3.path-style-access", "true")
       .put("s3.remote-signing-enabled", "false")
       .build();
   ```
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [x] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to