[PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
loserwang1024 opened a new pull request, #3230: URL: https://github.com/apache/flink-cdc/pull/3230 In mysql cdc, re-calculate the starting binlog offset after the new table added in MySqlBinlogSplit#appendFinishedSplitInfos, while there lack of same action in StreamSplit#appendFinishedSplitInfos. This will cause data loss if any newly added table snapshot split's highwatermark is smaller. Some unstable test problem occurs because of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
loserwang1024 commented on PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#issuecomment-2060235695 @PatrickRen , @morazow , CC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
loserwang1024 commented on code in PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1568614060 ## flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java: ## @@ -159,7 +159,15 @@ public String toString() { // --- public static StreamSplit appendFinishedSplitInfos( StreamSplit streamSplit, List splitInfos) { +// re-calculate the starting changelog offset after the new table added +Offset startingOffset = streamSplit.getStartingOffset(); +for (FinishedSnapshotSplitInfo splitInfo : splitInfos) { +if (splitInfo.getHighWatermark().isBefore(startingOffset)) { +startingOffset = splitInfo.getHighWatermark(); +} +} splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos()); + return new StreamSplit( streamSplit.splitId, streamSplit.getStartingOffset(), Review Comment: It seems true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
yuxiqian commented on code in PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1568608219 ## flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java: ## @@ -159,7 +159,15 @@ public String toString() { // --- public static StreamSplit appendFinishedSplitInfos( StreamSplit streamSplit, List splitInfos) { +// re-calculate the starting changelog offset after the new table added +Offset startingOffset = streamSplit.getStartingOffset(); +for (FinishedSnapshotSplitInfo splitInfo : splitInfos) { +if (splitInfo.getHighWatermark().isBefore(startingOffset)) { +startingOffset = splitInfo.getHighWatermark(); +} +} splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos()); + return new StreamSplit( streamSplit.splitId, streamSplit.getStartingOffset(), Review Comment: CMIIW, but seems newly added code just calculated the earliest starting offset into `startingOffset` but didn't really use it to generate new `StreamSplit`. Maybe missed a change here? ```suggestion return new StreamSplit( streamSplit.splitId, startingOffset, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
loserwang1024 commented on code in PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569794515 ## flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java: ## @@ -159,7 +159,15 @@ public String toString() { // --- public static StreamSplit appendFinishedSplitInfos( StreamSplit streamSplit, List splitInfos) { +// re-calculate the starting changelog offset after the new table added +Offset startingOffset = streamSplit.getStartingOffset(); +for (FinishedSnapshotSplitInfo splitInfo : splitInfos) { +if (splitInfo.getHighWatermark().isBefore(startingOffset)) { +startingOffset = splitInfo.getHighWatermark(); +} +} splitInfos.addAll(streamSplit.getFinishedSnapshotSplitInfos()); + return new StreamSplit( streamSplit.splitId, streamSplit.getStartingOffset(), Review Comment: done it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
morazow commented on code in PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569850914 ## flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java: ## @@ -163,10 +163,18 @@ public String toString() { // --- public static StreamSplit appendFinishedSplitInfos( StreamSplit streamSplit, List splitInfos) { +// re-calculate the starting changelog offset after the new table added +Offset startingOffset = streamSplit.getStartingOffset(); +for (FinishedSnapshotSplitInfo splitInfo : splitInfos) { +if (splitInfo.getHighWatermark().isBefore(startingOffset)) { +startingOffset = splitInfo.getHighWatermark(); +} +} Review Comment: Do we have to distinguish the high watermarks before the startingOffset? For example, if there are multiple high watermarks before startingOffset, which one should we take? Should it be the latest of those? Or is taking any highWatermark if it is before the startingOffset is allright? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
morazow commented on code in PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230#discussion_r1569863282 ## flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/meta/split/StreamSplit.java: ## @@ -163,10 +163,18 @@ public String toString() { // --- public static StreamSplit appendFinishedSplitInfos( StreamSplit streamSplit, List splitInfos) { +// re-calculate the starting changelog offset after the new table added +Offset startingOffset = streamSplit.getStartingOffset(); +for (FinishedSnapshotSplitInfo splitInfo : splitInfos) { +if (splitInfo.getHighWatermark().isBefore(startingOffset)) { +startingOffset = splitInfo.getHighWatermark(); +} +} Review Comment: Got it, it will be always the min value -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-35128][cdc-connector][cdc-base] Re-calculate the starting changelog offset after the new table added [flink-cdc]
PatrickRen merged PR #3230: URL: https://github.com/apache/flink-cdc/pull/3230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org