Re: [DISCUSS] Add a JDBC Sink Plugin to Flink-CDC-Pipeline

Jerry Sun, 02 Jun 2024 19:51:09 -0700

As far as I know, after the Flink-CDC table structure is changed, a
broadcast will be sent, causing other operators to suspend data
synchronization tasks and wait for the structure change to succeed before
continuing. This will cause data synchronization to terminate during the
structure change and cause a large amount of data to accumulate, which may
eventually cause an avalanche and lead to serious performance problems. We
currently have two optimization suggestions:
1. Incremental data buffer
Use the incremental data buffer to temporarily store CDC data change events
until the table structure change is completed.
Implementation method:
Implement a buffer to temporarily store CDC data changes.
After the table structure change is completed, the data in the buffer is
processed sequentially and submitted to the target database (can be written
in batches at one time).
2. Optimize table structure change operations
Try to reduce the time and frequency of table structure changes.
Implementation method:
Merge multiple table structure change operations to reduce the frequency of
operations.
Use database-specific optimization techniques (such as MySQL's Online DDL)
to speed up the table structure change process.


Yanquan Lv <decq12y...@gmail.com> 于2024年5月22日周三 21:11写道：

> Thanks jerry for driving this, JDBC sink for CDC pipeline is indeed a high
> demand in the community.
>
> I have one concern:
> Some databases that use jdbc, such as mysql, may be time-consuming to
> perform table structure changes, but FlinkCDC will not send DataChangeEvent
> during this period, which can cause significant latency in sending cdc data
> changes, You may need to consider and explain how to improve this
> situation.
>
> Jerry <kiss...@gmail.com> 于2024年5月15日周三 15:07写道：
>
> > Hi all
> > My name is ZhengjunZhou, an user and developer of FlinkCDC. In my recent
> > projects, I realized that we could enhance the capabilities of
> > Flink-CDC-Pipeline by introducing a JDBC Sink plugin, enabling FlinkCDC
> to
> > directly output change data capture (CDC) to various JDBC-supported
> > database systems.
> >
> > Currently, while FlinkCDC offers support for a wide range of data
> sources,
> > there is no direct solution for sinks, especially for common relational
> > databases. I believe that adding a JDBC Sink plugin will significantly
> > boost its applicability in data integration scenarios.
> >
> > Specifically, this plugin would allow users to configure database
> > connections and stream data directly to SQL databases via the standard
> JDBC
> > interface. This could be used for data migration tasks as well as
> real-time
> > data synchronization.
> >
> > To further discuss this proposal and gather feedback from the community,
> I
> > have prepared a preliminary design draft and hope to discuss it in detail
> > in the upcoming community meeting. Please consider the potential value of
> > this feature and provide your insights and guidance.
> >
> > Thank you for your time and consideration. I look forward to your active
> > feedback and further discussion.
> >
> > [1] https://github.com/apache/flink-connector-jdbc
> >
>

Re: [DISCUSS] Add a JDBC Sink Plugin to Flink-CDC-Pipeline

Reply via email to