[ 
https://issues.apache.org/jira/browse/FLINK-38183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-38183:
-----------------------------------
    Labels: pull-request-available  (was: )

> Data loss when cdc reading mysql that has out of order GTID
> -----------------------------------------------------------
>
>                 Key: FLINK-38183
>                 URL: https://issues.apache.org/jira/browse/FLINK-38183
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: 3.0.0
>         Environment: Fink-CDC: 3.5-SNAPSHOT
> Flink:1.20.1
>  
>            Reporter: LiuZeshan
>            Priority: Critical
>              Labels: pull-request-available
>
> As the design of 
> [https://github.com/apache/flink-cdc/pull/2220|http://example.com],CDC only 
> cares about the maximum GTID position and starts from it. For example, if 
> reading from gtid offset 1-7:9-10, it will automatically adjust to read from 
> 1-10, which causes an error in skipping gitd site 8, thus losing data. In 
> particular, when gtid bit 8 is a large transaction, it will cause more 
> serious data loss. We have encountered this problem many times in the 
> production environment.
> MySQL 5.7+ supports parallel replication based on group commit 
> (LOGICAL_LOCK). Conflict free transactions are distributed from the SQL 
> thread (Coordinator) of the database to multiple worker threads for 
> concurrent execution. Although the main database generates continuous GTIDs 
> in the order of submission (such as A: 1-100), the worker threads of the 
> slave database may complete transaction submission in disorder. When the CDC 
> reads the MySQL slave database, we may encounter the following gtid order. In 
> fact, we can also manually set the gtid to construct this scenario.
> {code:java}
> SET @@SESSION.GTID_NEXT='XXX:1';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:2';
> INSERT ...;
> ...
> SET @@SESSION.GTID_NEXT='XXX:7';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:9';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:10';
> INSERT ...;
> SET @@SESSION.GTID_NEXT='XXX:8';
> BEGIN;
> INSERT ...;
> ... 
> INSERT ...; -- (the number 1 million DML, checkpoint at this position)
> ...
> INSERT ...; -- (the number 2 millions DML)
> COMMIT;
> SET @@SESSION.GTID_NEXT='XXX:11';
> INSERT ...; {code}
> There are 2 million transactions at GTID location 8. When 1 million data are 
> read, a checkpoint is triggered and completed. The recorded git offset is 
> 1-7:9-10, and the skip events are 1 million, as shown below.
> {code:java}
> offset={transaction_id=null, ts_sec=1754145492, file=mysql-bin.000190, 
> pos=1443601, kind=SPECIFIC, gtids=xxx:1-7:9-10, row=3, event=1000000, 
> server_id=123} {code}
> The job is restarted and recovered from this checkpoint. According to the 
> design of CDC, it is automatically adjusted to read from 1-10, and continues 
> to skip 1 million events, resulting in the loss of 1 million unread data of 
> gitd site 8 and the loss of data contained in 1 million events starting from 
> gtid site 11.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to