[
https://issues.apache.org/jira/browse/FLINK-39238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-39238:
-----------------------------------
Labels: pull-request-available (was: )
> Support watermark alignment in flink-connector-kafka in dynamic Kafka source
> mode
> ---------------------------------------------------------------------------------
>
> Key: FLINK-39238
> URL: https://issues.apache.org/jira/browse/FLINK-39238
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Kafka
> Reporter: Xin Gao
> Priority: Critical
> Labels: pull-request-available
>
> Flink source ingestion is with watermark alignment config like
> `scan.watermark.alignment.max-drift`. The operator would then control
> * checkWatermarkAlignment for whole reader
> * checkSplitWatermarkAlignment for the split reader
> In dynamic Kafka source mode, the split reader pause/resume is missed
> (`pauseOrResumeSplits` not implemented) and the exceptions might raise once
> such config enabled.
> This feature is important for workflows like ingesting data from Kafka to
> Data Lake. Missing the alignment could create far more number of small files
> once the ingestion progress diverges and runs our of controls.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)