[ 
https://issues.apache.org/jira/browse/FLINK-39149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39149:
-----------------------------------
    Labels: pull-request-available  (was: )

> `gtid.new.channel.position=LATEST` Incorrectly Replays Pre-Checkpoint 
> Transactions for Old Channels in Multi-Source Replication
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39149
>                 URL: https://issues.apache.org/jira/browse/FLINK-39149
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>            Reporter: Jia Fan
>            Priority: Major
>              Labels: pull-request-available
>
> When using `gtid.new.channel.position=LATEST` and restarting from a 
> checkpoint,
> the GTID merging logic in `filterGtidSet` correctly skips history for *new 
> channels*
> (UUIDs not present in the checkpoint), but fails to handle *old channels*
> (UUIDs present in the checkpoint) whose recorded GTID range does not start 
> from `:1`.
> This causes MySQL to replay pre-checkpoint historical transactions, leading to
> {*}duplicate data or out-of-order events{*}.
> h3. Actual Behavior
> The `LATEST` branch in `filterGtidSet` executes:
> {code:java}
> mergedGtidSet = availableServerGtidSet.with(filteredGtidSet); {code}
> The `with` operation produces a union, using `filteredGtidSet` (checkpoint 
> value)
> as the override for known UUIDs. For an old channel with a non-contiguous 
> checkpoint
> GTID, the result is:
> {code:java}
> Checkpoint GTID: aaa-111:5000-8000
> Server GTID: aaa-111:1-10000
> bbb-222:1-3000
> Merged GTID sent to MySQL:
> aaa-111:5000-8000,
> bbb-222:1-3000 {code}
> MySQL then needs to send back:
> {code:java}
> aaa-111:1-4999 ← pre-checkpoint transactions, already consumed or should be 
> skipped
> aaa-111:8001-10000 {code}
> The pre-checkpoint transactions `aaa-111:1-4999` are {*}replayed 
> unexpectedly{*},
> causing data duplication.
> h3. Expected Behavior
> For old channels, the merged GTID should reflect all transactions up to the
> checkpoint boundary, without triggering a replay of pre-checkpoint history.
> Expected merged GTID:
> {code:java}
> aaa-111:1-8000,
> bbb-222:1-3000 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to