Wenkai Qi created FLINK-36931:
---------------------------------
Summary: FlinkCDC YAML supports synchronizing the full amount of
data of the entire database in Batch mode
Key: FLINK-36931
URL: https://issues.apache.org/jira/browse/FLINK-36931
Project: Flink
Issue Type: New Feature
Components: Flink CDC
Reporter: Wenkai Qi
h1. Background
MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1.
Expectation
FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. Benefits
# The performance improvement of Flink Batch can be utilized (dynamic
partition pruning, Hybrid Shuffle). Which optimizations of the batch mode will
be used needs to be discussed.
# The full amount of data of the entire database can be synchronized to
supplement data in an offline computing manner. In the future, it can even
support the full amount of data synchronization of the entire database for
other databases and data lakes.
h1. Under consideration
# Sink needs to switch to Batch mode.
[https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
# Call a checkpoint with checkpointid of Long.MAX_VALUE once, and the sink
should make the final submission based on this id.
# Sink directly supports Batch writing (such as DorisSink)
# ...(In supplementation)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)