Flink CDC Issue Import created FLINK-34843: ----------------------------------------------
Summary: [Bug] BinlogSplitReader#pollSplitRecords return finishedSplit when exception occurs Key: FLINK-34843 URL: https://issues.apache.org/jira/browse/FLINK-34843 Project: Flink Issue Type: Bug Components: Flink CDC Reporter: Flink CDC Issue Import ## Search before asking - [X] I searched in the [issues|https://github.com/ververica/flink-cdc-connectors/issues) and found nothing similar. ## Flink version 1.18 ## Flink CDC version 3.0 ## Database and its version any ## Minimal reproduce step ### Current Code In current BinlogSplitReader#pollSplitRecords, when the currentTaskRunning = false, will return null, which is seen as fininished split. See: ```java //com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader#pollSplitRecords return dataIt == null ? finishedSplit() : forRecords(dataIt); ``` ### Problem occurs: However, the currentTaskRunning = false in four situations: 1. the bounded stream split is finished( later in issue: https://github.com/ververica/flink-cdc-connectors/issues/2867) 2. the stream split is paused for new scanly tables(See MySqlSplitReader#suspendBinlogReaderIfNeed) 3. some exception occurs(See executorService.submit) 4. The BinlogSplitReader#close Only in the former two situations, the spilt is finished, otherwise problem will occor. For example, there is an unbounded stream split: <img width="1193" alt="image" src="https://github.com/ververica/flink-cdc-connectors/assets/125648852/ca9dd77c-d111-47c2-afb6-fc13232339a5"> * t1, add this unbouned stream split and start a new thread to fetch binlog. * t2, BinlogSplitReader#pollSplitRecords check there is no Exception at first. * t3, some excetpion occurs in binlogSplitTask(network, data error, and more), set currentTaskRunning = false. * t4, BinlogSplitReader#pollSplitRecords check currentTaskRunning is fasle, so return null, which is seen as fininished split. Then MysqlSourceReader move to next split. Thus, when the task is not running, we also need to distinguish whether the split is finished more carefully. I have two idea: 1. add lock(not a good choice] 2. when the stream split is paused, we also add an END watermark to queue. Only when get an END watermark, BinlogSplitReader#pollSplitRecords return null, otherwise return empty collections. ## What did you expect to see? Only when get an END watermark, BinlogSplitReader#pollSplitRecords return null, otherwise return empty collections. ## What did you see instead? [Bug] BinlogSplitReader#pollSplitRecords return finishedSplit(null) when exception occurs ## Anything else? _No response_ ## Are you willing to submit a PR? - [X] I'm willing to submit a PR! ---------------- Imported from GitHub ---------------- Url: https://github.com/apache/flink-cdc/issues/2878 Created by: [loserwang1024|https://github.com/loserwang1024] Labels: bug, Created at: Mon Dec 18 09:48:18 CST 2023 State: open -- This message was sent by Atlassian Jira (v8.20.10#820010)