Hi team, Following my discussion with Leonard Xu at Flink Forward, I am writing to propose a feature enhancement for the Flink MySQL CDC connector related to how it handles transaction metadata from the MySQL binary log.
Problem Statement: In data streaming pipelines that require transactional guarantees or need to group atomic changes together, it is essential to identify the boundaries of the original database transaction (i.e., the BEGIN and COMMIT or END events). Currently, the Flink MySQL CDC connector appears to skip these transaction lifecycle events - https://github.com/apache/flink-cdc/blob/23a1c2efb6fa9ce1c9f17b3836f6aaa995bb0660/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/reader/MySqlRecordEmitter.java#L77 . I have also attached a screenshot of the logs from this behaviour. This omission makes it challenging to reconstruct the original transaction scope. Without explicit transaction markers, downstream Flink jobs cannot easily guarantee atomicity across sinks. Proposed Solution: The underlying CDC mechanism, Debezium, supports emitting transaction boundary events (BEGIN and END/COMMIT) through its configuration. We propose enhancing the Flink MySQL CDC connector to expose this transaction metadata to the Flink pipeline. The connector should emit specialised records or metadata fields that indicate the start and end of a transaction as emitted. We would be happy to create a PR with this feature if this proposal goes ahead. Thank you, Tejansh
