gustavodemorais commented on code in PR #28199: URL: https://github.com/apache/flink/pull/28199#discussion_r3273271114
########## docs/content/docs/sql/reference/queries/changelog.md: ########## @@ -369,6 +381,13 @@ When `PARTITION BY` is provided, **the output schema changes**. The partition ke Prefer row semantics, when possible. `PARTITION BY` is only necessary when downstream operators are keyed on that column and you want to co-locate rows for the same key in the same parallel operator instance. +#### Avoiding ChangelogNormalize for upsert sources + +When the input is an upsert source (emits `UPDATE_AFTER` but no `UPDATE_BEFORE`), the planner inserts a `ChangelogNormalize` operator by default to materialize `UPDATE_BEFORE` rows and complete `DELETE` payloads. This operator is stateful and can be expensive. When `PARTITION BY` is provided, the planner skips `ChangelogNormalize` if `op_mapping` does not emit the corresponding kinds: Review Comment: I think you have a point. In the future, when we support state, the idea is that both to and from changelog can indeed replace ChangelogNormalize in some cases. I think it's currently not the case and can be misleading and not easy for them to understand. I'll remove this whole section for now and give it a go again when we eventually add state support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
