Hi, Dawid. I fully agree with optimizing the SUM, as it can easily become a
bottleneck in production environments. After reading the FLIP, I have the
following questions:
1. The default behavior changes if no ON CONFLICT is defined. I am a little
concerned that this may cause errors in a large number of existing cases.
2. Regarding On Conflict Errors, in the context of CDC streams, it is expected
that the vast majority of cases cannot generate only one record with one
primary key. The only solutions I can think of are append-only top1,
deduplication, or aggregating the first row.
3. The special watermark generation interval affects the visibility of results.
How can users configure this generation interval?
4. I believe that resolving out-of-order issues and addressing internal
consistency are two separate problems. As I understand the current solution, it
does not really resolve the internal consistency issue. We could first resolve
the out-of-order problem. For most scenarios that require real-time response,
we can directly output intermediate results promptly.
5. How can we compact data with the same custom watermark? If detailed
comparisons are necessary, I think we still need to preserve all key data; we
would just be compressing this data further at time t.
6. If neither this proposed solution nor the reject solution can resolve
internal consistency, we need to reconsider the differences between the two
approaches.
7. Speaking off-topic, addressing internal consistency might exceed the scope
of this FLIP. I think it may be necessary to sacrifice some degree of real-time
performance; for instance, if we could divide the source table into multiple
versions of bounded data (for example, by using snapshots of the source table),
it might address the internal consistency issue to some extent. WDYT?
Looking forward to your reply ;)
--
Best!
Xuyang
At 2025-12-02 18:30:29, "Dawid Wysakowicz" <[email protected]> wrote:
>Hi everyone,
>
>I would like to start a discussion on FLIP-558 Improvements to
>SinkUpsertMaterializer and changelog disorder [1].
>
>I am trying to suggest a few improvements to how we use Sink Upsert
>Materializer there.
>
>Looking forward to your feedback and thoughts!
>
>[1] https://cwiki.apache.org/confluence/x/NoTMFw
>
>Best regards,
>Dawid