Hey Dawid,

Thanks for the initiative and looking into this. As you mentioned, I think
there are valid use cases where we still don't have a great solution for
users (e.g. the engine is sometimes not smart enough to forward a full
upsert key if you use a built-in function). That said, I think this is a
good next step into giving good controls for users within SQL. So +1

Comments from my side:
- The FLIP contains a lot of parts to it but I understand why we need all
of the three sections playing together to achieve the goal. I think the
general goal is to improve performance which leads to the design changes.
It'd be nice to explain why they depend on each other and why it makes
sense to pack it in one single FLIP which the goal of improving the
performance of these use cases.
- Could you move the examples of where SUM is needed and where the FLIP
will be helpful to the motivation? That way, it's easier to have a clear
understanding of what we're solving with the FLIP
- Could you extend the flip with what will be the behaviour if users try to
enable SUM but have no watermarks defined?
- In general, I personally like the new default behaviour for ON CONFLICT
better (and if you have no changes to the SQL, we'll still have the current
behaviour).

Kind Regards,
Gustavo





On Fri, 5 Dec 2025 at 03:59, Xuyang <[email protected]> wrote:

> Hi, Dawid. I fully agree with optimizing the SUM, as it can easily become
> a bottleneck in production environments. After reading the FLIP, I have the
> following questions:
>
>
> 1. The default behavior changes if no ON CONFLICT is defined. I am a
> little concerned that this may cause errors in a large number of existing
> cases.
> 2. Regarding On Conflict Errors, in the context of CDC streams, it is
> expected that the vast majority of cases cannot generate only one record
> with one primary key. The only solutions I can think of are append-only
> top1, deduplication, or aggregating the first row.
> 3. The special watermark generation interval affects the visibility of
> results. How can users configure this generation interval?
> 4. I believe that resolving out-of-order issues and addressing internal
> consistency are two separate problems. As I understand the current
> solution, it does not  really resolve the internal consistency issue. We
> could first resolve the out-of-order problem. For most scenarios that
> require real-time response, we can directly output intermediate results
> promptly.
> 5. How can we compact data with the same custom watermark? If detailed
> comparisons are necessary, I think we still need to preserve all key data;
> we would just be compressing this data further at time t.
> 6. If neither this proposed solution nor the reject solution can resolve
> internal consistency, we need to reconsider the differences between the two
> approaches.
> 7. Speaking off-topic, addressing internal consistency might exceed the
> scope of this FLIP. I think it may be necessary to sacrifice some degree of
> real-time performance; for instance, if we could divide the source table
> into multiple versions of bounded data (for example, by using snapshots of
> the source table), it might address the internal consistency issue to some
> extent. WDYT?
>
>
> Looking forward to your reply ;)
>
>
>
> --
>
>     Best!
>     Xuyang
>
>
> At 2025-12-02 18:30:29, "Dawid Wysakowicz" <[email protected]> wrote:
> >Hi everyone,
> >
> >I would like to start a discussion on FLIP-558 Improvements to
> >SinkUpsertMaterializer and changelog disorder [1].
> >
> >I am trying to suggest a few improvements to how we use Sink Upsert
> >Materializer there.
> >
> >Looking forward to your feedback and thoughts!
> >
> >[1] https://cwiki.apache.org/confluence/x/NoTMFw
> >
> >Best regards,
> >Dawid
>

Reply via email to