upsert stream with set semantics [flink]

via GitHub Wed, 20 May 2026 03:47:48 -0700


gustavodemorais commented on code in PR #28199:
URL: https://github.com/apache/flink/pull/28199#discussion_r3273271114



##########
docs/content/docs/sql/reference/queries/changelog.md:
##########
@@ -369,6 +381,13 @@ When `PARTITION BY` is provided, **the output schema 
changes**. The partition ke
 
 Prefer row semantics, when possible. `PARTITION BY` is only necessary when 
downstream operators are keyed on that column and you want to co-locate rows 
for the same key in the same parallel operator instance.
 
+#### Avoiding ChangelogNormalize for upsert sources
+
+When the input is an upsert source (emits `UPDATE_AFTER` but no 
`UPDATE_BEFORE`), the planner inserts a `ChangelogNormalize` operator by 
default to materialize `UPDATE_BEFORE` rows and complete `DELETE` payloads. 
This operator is stateful and can be expensive. When `PARTITION BY` is 
provided, the planner skips `ChangelogNormalize` if `op_mapping` does not emit 
the corresponding kinds:

Review Comment:
   I think you have a point. In the future, when we support state, the idea is 
that both to and from changelog can indeed replace ChangelogNormalize in some 
cases. I think it's currently not the case and can be misleading and not easy 
for them to understand. I'll remove this whole section for now and give it a go 
again when we eventually add state support.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39708][table] Support TO_CHANGELOG retract/upsert stream -> upsert stream with set semantics [flink]

Reply via email to