[
https://issues.apache.org/jira/browse/FLINK-38183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-38183:
-----------------------------------
Labels: pull-request-available (was: )
> Data loss when cdc reading mysql that has out of order GTID
> -----------------------------------------------------------
>
> Key: FLINK-38183
> URL: https://issues.apache.org/jira/browse/FLINK-38183
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: 3.0.0
> Environment: Fink-CDC: 3.5-SNAPSHOT
> Flink:1.20.1
>
> Reporter: LiuZeshan
> Priority: Critical
> Labels: pull-request-available
>
> As the design of
> [https://github.com/apache/flink-cdc/pull/2220|http://example.com],CDC only
> cares about the maximum GTID position and starts from it. For example, if
> reading from gtid offset 1-7:9-10, it will automatically adjust to read from
> 1-10, which causes an error in skipping gitd site 8, thus losing data. In
> particular, when gtid bit 8 is a large transaction, it will cause more
> serious data loss. We have encountered this problem many times in the
> production environment.
> MySQL 5.7+ supports parallel replication based on group commit
> (LOGICAL_LOCK). Conflict free transactions are distributed from the SQL
> thread (Coordinator) of the database to multiple worker threads for
> concurrent execution. Although the main database generates continuous GTIDs
> in the order of submission (such as A: 1-100), the worker threads of the
> slave database may complete transaction submission in disorder. When the CDC
> reads the MySQL slave database, we may encounter the following gtid order. In
> fact, we can also manually set the gtid to construct this scenario.
> {code:java}
> SET @@SESSION.GTID_NEXT='XXX:1';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:2';
> INSERT ...;
> ...
> SET @@SESSION.GTID_NEXT='XXX:7';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:9';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:10';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:8';
> BEGIN;
> INSERT ...;
> ...
> INSERT ...; -- (the number 1 million DML, checkpoint at this position)
> ...
> INSERT ...; -- (the number 2 millions DML)
> COMMIT;
> SET @@SESSION.GTID_NEXT='XXX:11';
> INSERT ...; {code}
> There are 2 million transactions at GTID location 8. When 1 million data are
> read, a checkpoint is triggered and completed. The recorded git offset is
> 1-7:9-10, and the skip events are 1 million, as shown below.
> {code:java}
> offset={transaction_id=null, ts_sec=1754145492, file=mysql-bin.000190,
> pos=1443601, kind=SPECIFIC, gtids=xxx:1-7:9-10, row=3, event=1000000,
> server_id=123} {code}
> The job is restarted and recovered from this checkpoint. According to the
> design of CDC, it is automatically adjusted to read from 1-10, and continues
> to skip 1 million events, resulting in the loss of 1 million unread data of
> gitd site 8 and the loss of data contained in 1 million events starting from
> gtid site 11.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)