[
https://issues.apache.org/jira/browse/FLINK-37327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928447#comment-17928447
]
Kevin Lam commented on FLINK-37327:
-----------------------------------
This Jira ticket is about a change to the Deserialization/Decoder logic,
supporting UPSERT (ChangelogMode=[I,UA,D]) mode.
I propose to be able to skip emitting the UPDATE_BEFORE Rows from a
DynamicTableSource using DebeziumAvroFormat, so that downstream operators can
save on processing the UPDATE_BEFORE Rows. This is fine when downstream
operators do not need the UPDATE_BEFORE retractions for correctness. Timo
Walther discusses this optimization in this presentation as well:
https://youtu.be/iRlLaY-P6iE?si=2QTlrVnh-iEXzYPb&t=1176
On the Serialization/Encoder side, there would be no change in behaviour other
than the advertised ChangelogMode, since as you said "Flink encodes
UPDATE_BEFORE and UDPATE_AFTER as DELETE and INSERT Debezium messages", there's
no UPDATE_BEFORE Rows to skip.
Does that make sense?
> Debezium Avro Format: Add FormatOption to Optionally Skip emitting
> UPDATE_BEFORE Rows
> -------------------------------------------------------------------------------------
>
> Key: FLINK-37327
> URL: https://issues.apache.org/jira/browse/FLINK-37327
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.20.1
> Reporter: Kevin Lam
> Priority: Minor
> Labels: pull-request-available
>
> Add a Format Option to the Debezium Format to optionally skip emitting the
> UPDATE_BEFORE Rows when deserializing a Debezium message with op='u'.
> This is helpful for Flink SQL applications that want to operate in UPSERT
> (ChangelogMode=[I,UA,D]) mode and save on processing the UPDATE_BEFORE Rows
> since the downstream sinks can handle it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)