[
https://issues.apache.org/jira/browse/FLINK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Leonard Xu reassigned FLINK-36931:
----------------------------------
Assignee: Wenkai Qi
> FlinkCDC YAML supports synchronizing the full amount of data of the entire
> database in Batch mode
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-36931
> URL: https://issues.apache.org/jira/browse/FLINK-36931
> Project: Flink
> Issue Type: New Feature
> Components: Flink CDC
> Reporter: Wenkai Qi
> Assignee: Wenkai Qi
> Priority: Major
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> h1. Background
> MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1.
> Expectation
> FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1. Benefits
>
> # The performance improvement of Flink Batch can be utilized (dynamic
> partition pruning, Hybrid Shuffle). Which optimizations of the batch mode
> will be used needs to be discussed.
> # The full amount of data of the entire database can be synchronized to
> supplement data in an offline computing manner. In the future, it can even
> support the full amount of data synchronization of the entire database for
> other databases and data lakes.
> h1. Under consideration
>
> # Sink needs to switch to Batch mode.
> [https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
> # For 2PC sink, call a checkpoint with checkpointid of Long.MAX_VALUE once,
> and the sink should make the final submission based on this id.
> # Sink directly supports Batch writing (such as DorisSink)
> # ...(In supplementation)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)