Flink CDC Issue Import created FLINK-34889:
----------------------------------------------
Summary: Flink CDC may occur binlog can't be find when dynamic
added table repeatedly
Key: FLINK-34889
URL: https://issues.apache.org/jira/browse/FLINK-34889
Project: Flink
Issue Type: Bug
Components: Flink CDC
Reporter: Flink CDC Issue Import
We met a strange problem of the Flink CDC: When repeatedly adding table to a
Flink CDC link, it may fails and report a very old gtid can't be find, we
digging the source code and found the reason is bellow:
1. When CDC full phase change to incremental phase, binlog need pull ending
offset of all chunk, and it will take the minimum of these offset as the
stating offset of the incremental phase.Ending offset of each chunk are store
in the JM.
2. If we added table repeatedly, and each time we need to suspend the job,
alter the config, and then resume form latest checkpoint.
3. Normally, when finished adding table, we pull the ending offset of each
chunk. The pull process will transfer a size between the jm and tm, which means
when there is 100 tables in jm, and we have processed 80, we need process 81 to
pull the next offset.
4. There has one problem because the order of the split in jm and tm is not the
same.The jm will order by table name (such as a:0, a:1, b:0, b:1), when added
table, we need pull the ending offset of the newly added table, while jm order
the split by the table name, and the newly added table may occurs in middle, so
we may get a ending offset of a very old split.
<img width="1386" alt="1"
src="https://github.com/apache/flink-cdc/assets/5321584/f8383f59-82d9-4d97-bad7-1aea54c6ac81">
---------------- Imported from GitHub ----------------
Url: https://github.com/apache/flink-cdc/issues/3141
Created by: [zlzhang0122|https://github.com/zlzhang0122]
Labels:
Created at: Wed Mar 13 21:17:17 CST 2024
State: open
--
This message was sent by Atlassian Jira
(v8.20.10#820010)