norrishuang opened a new pull request, #4314:
URL: https://github.com/apache/flink-cdc/pull/4314
## Summary
Add AWS Glue Catalog support for the Iceberg pipeline sink connector.
Previously, the Iceberg pipeline only supported `hadoop` and `hive` catalog
types. This PR enables users to use AWS Glue Data Catalog as the Iceberg
catalog by setting `catalog.properties.type: glue`.
## Changes
### New Configuration Options
| Option | Type | Description |
|--------|------|-------------|
| `catalog.properties.type` | String | Now supports `glue` in addition to
`hadoop` and `hive` |
| `catalog.properties.catalog-impl` | String | Custom catalog implementation
class (e.g. `org.apache.iceberg.aws.glue.GlueCatalog`) |
| `catalog.properties.io-impl` | String | Custom FileIO implementation (e.g.
`org.apache.iceberg.aws.s3.S3FileIO`) |
| `catalog.properties.glue.id` | String | Glue Catalog ID (AWS account ID)
for cross-account access |
| `catalog.properties.glue.skip-archive` | Boolean | Skip archiving older
table versions in Glue (default: true) |
| `catalog.properties.glue.skip-name-validation` | Boolean | Skip name
validation for Glue catalog (default: false) |
| `catalog.properties.client.region` | String | AWS region for the Glue
catalog client |
### Files Modified
- `IcebergDataSinkOptions.java` — Added Glue-related config options, updated
`TYPE` and `WAREHOUSE` descriptions
- `IcebergDataSinkFactory.java` — Registered new optional config options
- `IcebergDataSinkFactoryTest.java` — Added 2 test cases for Glue catalog
creation (via `type=glue` and `catalog-impl`)
- `pom.xml` — Dependency and shade plugin adjustments
## Usage Example
```yaml
sink:
type: iceberg
catalog.properties.type: glue
catalog.properties.warehouse: s3://my-bucket/warehouse/
catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
catalog.properties.client.region: us-east-1
```
## How It Works
Iceberg's CatalogUtil.buildIcebergCatalog() natively supports type=glue and
automatically loads org.apache.iceberg.aws.glue.GlueCatalog. This PR exposes
the necessary configuration options through the Flink CDC pipeline config layer
and ensures the Glue-related catalog properties are correctly passed through
via the catalog.properties.* prefix.
### Testing
Unit tests pass (6/6 in IcebergDataSinkFactoryTest)
Verified end-to-end on Amazon EMR (Flink 1.20, Iceberg 1.10.0-amzn) with
MySQL CDC → Iceberg (Glue Catalog + S3)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]