xuyangzhong commented on PR #26879: URL: https://github.com/apache/flink/pull/26879#issuecomment-3173232075
> @xuyangzhong what do you think about this change? Is acceptable to simply update the affected test or do we need changes in `DuplicateChangesInferRule`? The current Flink behavior is incorrect: if a sink requests retract, the engine needs to guarantee that -U will be send, independent of the primary key declaration. Regarding this change for delta join, I've thought about it, and it seems necessary for the DuplicateChangesInferRule to return false when dealing with retract sinks (in https://github.com/apache/flink/blob/cae5fb4d3b6d9e0c10c3539ea4994fc1ad463b70/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/plan/rules/physical/stream/DuplicateChangesInferRule.java#L169). This is because delta joins might generate redundant data, while retract sinks require data to appear in pairs as both '+' and '-' forms. This issue stands on its own, so I can go ahead and create a separate jira to resolve it if you prefer. It's entirely your decision. In summary, here are the situations regarding the failed cases below: - DuplicateChangesInferRuleTest.testSinkWithMaterialize: the `duplicateChanges` in upstream operator for sink should be `DISALLOW `. - DeltaJoinTest.testCdcSource: just apply new changes. - DeltaJoinTest.testWithAggregatingAfterJoin: just apply new changes. - DeltaJoinTest.testWithAggregatingSourceTableBeforeJoin: just apply new changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
