Hi team,

Following my discussion with Leonard Xu at Flink Forward, I am writing to 
propose a feature enhancement for the Flink MySQL CDC connector related to how 
it handles transaction metadata from the MySQL binary log.

Problem Statement:
In data streaming pipelines that require transactional guarantees or need to 
group atomic changes together, it is essential to identify the boundaries of 
the original database transaction (i.e., the BEGIN and COMMIT or END events). 
Currently, the Flink MySQL CDC connector appears to skip these transaction 
lifecycle events - 
https://github.com/apache/flink-cdc/blob/23a1c2efb6fa9ce1c9f17b3836f6aaa995bb0660/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/reader/MySqlRecordEmitter.java#L77
 .
I have also attached a screenshot of the logs from this  behaviour.

This omission makes it challenging to reconstruct the original transaction 
scope. Without explicit transaction markers, downstream Flink jobs cannot 
easily guarantee atomicity across sinks.

Proposed Solution:
The underlying CDC mechanism, Debezium, supports emitting transaction boundary 
events (BEGIN and END/COMMIT) through its configuration.

We propose enhancing the Flink MySQL CDC connector to expose this transaction 
metadata to the Flink pipeline. The connector should emit specialised records 
or metadata fields that indicate the start and end of a transaction as emitted. 
We would be happy to create a PR with this feature if this proposal goes ahead.

Thank you,
Tejansh


Reply via email to