Flink CDC Issue Import created FLINK-34889: ----------------------------------------------
Summary: Flink CDC may occur binlog can't be find when dynamic added table repeatedly Key: FLINK-34889 URL: https://issues.apache.org/jira/browse/FLINK-34889 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: Flink CDC Issue Import We met a strange problem of the Flink CDC: When repeatedly adding table to a Flink CDC link, it may fails and report a very old gtid can't be find, we digging the source code and found the reason is bellow: 1. When CDC full phase change to incremental phase, binlog need pull ending offset of all chunk, and it will take the minimum of these offset as the stating offset of the incremental phase.Ending offset of each chunk are store in the JM. 2. If we added table repeatedly, and each time we need to suspend the job, alter the config, and then resume form latest checkpoint. 3. Normally, when finished adding table, we pull the ending offset of each chunk. The pull process will transfer a size between the jm and tm, which means when there is 100 tables in jm, and we have processed 80, we need process 81 to pull the next offset. 4. There has one problem because the order of the split in jm and tm is not the same.The jm will order by table name (such as a:0, a:1, b:0, b:1), when added table, we need pull the ending offset of the newly added table, while jm order the split by the table name, and the newly added table may occurs in middle, so we may get a ending offset of a very old split. <img width="1386" alt="1" src="https://github.com/apache/flink-cdc/assets/5321584/f8383f59-82d9-4d97-bad7-1aea54c6ac81"> ---------------- Imported from GitHub ---------------- Url: https://github.com/apache/flink-cdc/issues/3141 Created by: [zlzhang0122|https://github.com/zlzhang0122] Labels: Created at: Wed Mar 13 21:17:17 CST 2024 State: open -- This message was sent by Atlassian Jira (v8.20.10#820010)