Hey Yaroslav,

Thanks for your response! Got it, so the need for UPDATE_BEFOREs will
depend on your sinks. I just watched the talk and it makes sense when you
think of the UPDATE_BEFOREs as retractions.

In the talk, Timo discusses how removing the need for UPDATE_BEFORE is an
optimization of sorts, if your use-case allows for it, since it'd enable
removing a bunch of messages that processed by Flink.

I'm wondering about the converse, are there any situations where having
UPDATE_BEFORE's will result in improved performance? Does the planner take
advantage of them in some situations?
I don't have a specific example in mind but just trying to understand the
full implications of missing UPDATE_BEFORE messages.

On Wed, Feb 7, 2024 at 4:24 PM Yaroslav Tkachenko
<yaros...@goldsky.com.invalid> wrote:

> Hey Kevin,
>
> In my experience it mostly depends on the type of your sinks. If all of
> your sinks can leverage primary keys and support upsert semantics, you
> don't really need UPDATE_BEFOREs altogether (you can even filter them out).
> But if you have sinks with append-only semantics (OR if you don't have
> primary keys defined) you need UPDATE_BEFOREs to correctly support
> retractions (in case of updates and deletes).
>
> Great talk on this topic:
> https://www.youtube.com/watch?v=iRlLaY-P6iE&ab_channel=PlainSchwarz (the
> middle part is the most relevant).
>
>
> On Wed, Feb 7, 2024 at 12:13 PM Kevin Lam <kevin....@shopify.com.invalid>
> wrote:
>
> > Hi there!
> >
> > I have a question about Changelog Stream Processing with Flink SQL and
> the
> > Flink Table API. I would like to better understand how UPDATE_BEFORE
> fields
> > are used by Flink.
> >
> > Our team uses Debezium to extract Change Data Capture events from MySQL
> > databases. We currently redact the `before` fields in the envelope [0] so
> > that redacted PII doesn't sit in our Kafka topics in the `before` field
> of
> > UPDATE events.
> >
> > As a result if we were to consume these CDC streams with Flink, there
> would
> > be missing UPDATE_BEFORE fields for UPDATE events. What kind of impact
> > would this have on performance and correctness, if any? Any other
> > considerations we should be aware of?
> >
> > Thanks in advance for your help!
> >
> >
> > [0]
> > https://debezium.io/documentation/reference/stable/connectors/mysql.html
> >
>

Reply via email to