[ 
https://issues.apache.org/jira/browse/FLINK-39245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Huang updated FLINK-39245:
-------------------------------
    Description: 
Motivation
 
Currently, the Iceberg pipeline connector only supports *hadoop* and *hive* 
catalog types. AWS Glue Data Catalog is widely used as the metastore for 
Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink 
deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg 
tables managed by Glue Catalog are unable to do so with the current 
implementation.
 
Since Iceberg's _CatalogUtil.buildIcebergCatalog()_ already natively supports 
_type=glue_ (mapping to {_}org.apache.iceberg.aws.glue.GlueCatalog{_}), the 
Flink CDC Iceberg connector just needs to:
1. Add _iceberg-aws_ as a compile-time dependency
2. Expose the Glue-related configuration options through the pipeline config 
layer
3. Ensure the catalog properties are correctly passed through
 

Proposed Changes
 
Add _iceberg-aws_ dependency (provided scope) to 
flink-cdc-pipeline-connector-iceberg

Add new configuration options in `IcebergDataSinkOptions`:
-`catalog.properties.type` — extend description to include `glue`
-`catalog.properties.catalog-impl` — custom catalog implementation class
-`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
-`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
-`catalog.properties.glue.skip-archive` — skip archiving older table versions
-`catalog.properties.glue.skip-name-validation` — skip Glue name validation
-`catalog.properties.client.region` — AWS region for the Glue client
-Register new options in `IcebergDataSinkFactory`

Add unit tests for Glue catalog DataSink creation
 

Usage Example
 
{code:java}
sink:
type: iceberg
catalog.properties.type: glue
catalog.properties.warehouse: s3://my-bucket/warehouse/
catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
catalog.properties.client.region: us-east-1{code}

  was:
## Motivation
 
Currently, the Iceberg pipeline connector only supports `hadoop` and `hive` 
catalog types. AWS Glue Data Catalog is widely used as the metastore for 
Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink 
deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg 
tables managed by Glue Catalog are unable to do so with the current 
implementation.
 
Since Iceberg's `CatalogUtil.buildIcebergCatalog()` already natively supports 
`type=glue` (mapping to `org.apache.iceberg.aws.glue.GlueCatalog`), the Flink 
CDC Iceberg connector just needs to:
1. Add `iceberg-aws` as a compile-time dependency
2. Expose the Glue-related configuration options through the pipeline config 
layer
3. Ensure the catalog properties are correctly passed through
 
## Proposed Changes
 
-Add `iceberg-aws` dependency (provided scope) to 
`flink-cdc-pipeline-connector-iceberg`
- Add new configuration options in `IcebergDataSinkOptions`:
-`catalog.properties.type` — extend description to include `glue`
-`catalog.properties.catalog-impl` — custom catalog implementation class
-`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
-`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
-`catalog.properties.glue.skip-archive` — skip archiving older table versions
-`catalog.properties.glue.skip-name-validation` — skip Glue name validation
-`catalog.properties.client.region` — AWS region for the Glue client
-Register new options in `IcebergDataSinkFactory`
- Add unit tests for Glue catalog DataSink creation
 
## Usage Example
 
```yaml
sink:
type: iceberg
catalog.properties.type: glue
catalog.properties.warehouse: s3://my-bucket/warehouse/
catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
catalog.properties.client.region: us-east-1


> Support AWS Glue Catalog for Iceberg pipeline connector
> -------------------------------------------------------
>
>                 Key: FLINK-39245
>                 URL: https://issues.apache.org/jira/browse/FLINK-39245
>             Project: Flink
>          Issue Type: New Feature
>          Components: Flink CDC
>         Environment: The Iceberg AWS bundle (iceberg-aws-bundle or 
> equivalent) and AWS SDK must be available in the runtime classpath. On Amazon 
> EMR, the bundled iceberg-flink-runtime already includes Glue Catalog support. 
> For non-EMR environments, users need to add iceberg-aws-bundle-<version>.jar 
> to the Flink lib/ directory.
>            Reporter: Xiao Huang
>            Priority: Minor
>
> Motivation
>  
> Currently, the Iceberg pipeline connector only supports *hadoop* and *hive* 
> catalog types. AWS Glue Data Catalog is widely used as the metastore for 
> Iceberg tables on AWS, especially in Amazon EMR, EKS, and self-managed Flink 
> deployments on EC2. Users who want to use Flink CDC to sync data into Iceberg 
> tables managed by Glue Catalog are unable to do so with the current 
> implementation.
>  
> Since Iceberg's _CatalogUtil.buildIcebergCatalog()_ already natively supports 
> _type=glue_ (mapping to {_}org.apache.iceberg.aws.glue.GlueCatalog{_}), the 
> Flink CDC Iceberg connector just needs to:
> 1. Add _iceberg-aws_ as a compile-time dependency
> 2. Expose the Glue-related configuration options through the pipeline config 
> layer
> 3. Ensure the catalog properties are correctly passed through
>  
> Proposed Changes
>  
> Add _iceberg-aws_ dependency (provided scope) to 
> flink-cdc-pipeline-connector-iceberg
> Add new configuration options in `IcebergDataSinkOptions`:
> -`catalog.properties.type` — extend description to include `glue`
> -`catalog.properties.catalog-impl` — custom catalog implementation class
> -`catalog.properties.io-impl` — custom FileIO implementation (e.g. `S3FileIO`)
> -`catalog.properties.glue.id` — Glue Catalog ID for cross-account access
> -`catalog.properties.glue.skip-archive` — skip archiving older table versions
> -`catalog.properties.glue.skip-name-validation` — skip Glue name validation
> -`catalog.properties.client.region` — AWS region for the Glue client
> -Register new options in `IcebergDataSinkFactory`
> Add unit tests for Glue catalog DataSink creation
>  
> Usage Example
>  
> {code:java}
> sink:
> type: iceberg
> catalog.properties.type: glue
> catalog.properties.warehouse: s3://my-bucket/warehouse/
> catalog.properties.io-impl: org.apache.iceberg.aws.s3.S3FileIO
> catalog.properties.client.region: us-east-1{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to