[
https://issues.apache.org/jira/browse/FLINK-40038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092921#comment-18092921
]
牛一凡 commented on FLINK-40038:
-----------------------------
I would like to work on this issue. Please assign it to me.
> [mysql][pipeline] Incremental sync throughput is low in hotspot UPDATE
> workloads due to deserialization overhead
> ----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-40038
> URL: https://issues.apache.org/jira/browse/FLINK-40038
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Affects Versions: cdc-3.7.0
> Reporter: 牛一凡
> Priority: Minor
> Attachments: image-2026-07-01-18-11-43-636.png
>
>
> *Motivation*
> We observed low incremental sync throughput on a MySQL-to-Doris pipeline when
> using Flink CDC in a large-table hotspot UPDATE workload. In this scenario,
> upstream and downstream started to lag behind and the job showed an obvious
> backlog during incremental synchronization.
> After collecting and analyzing the job flame graph, we found that a
> significant portion of the CPU time was spent in the MySQL pipeline
> deserialization path, especially around repeated schema/data type inference
> during row deserialization. This overhead becomes more noticeable when a
> table receives frequent UPDATE events.
> A related performance concern was mentioned in FLINK-35715, but in our
> workload this bottleneck still exists and is still impactful enough to cause
> chasing-lag behavior in production-like environments.
> It would be great to further investigate and optimize this issue.
> *Flame Graph*
> !image-2026-07-01-18-11-43-636.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)