Alex Sorokoumov created FLINK-31408:
---------------------------------------

             Summary: Add EXACTLY_ONCE support to upsert-kafka
                 Key: FLINK-31408
                 URL: https://issues.apache.org/jira/browse/FLINK-31408
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / Kafka
            Reporter: Alex Sorokoumov


{{upsert-kafka}} connector should support optional {{EXACTLY_ONCE}} delivery 
semantics.

[upsert-kafka 
docs|https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/upsert-kafka/#consistency-guarantees]
 suggest that the connector handles duplicate records from 
{{{}AT_LEAST_ONCE{}}}. However, at least 2 reasons exist to configure the 
connector with {{{}EXACTLY_ONCE{}}}.

First, there might be other non-Flink topic consumers that would rather not 
have duplicated records.

Second, multiple {{upsert-kafka}} producers might cause keys to roll back to 
previous values. Consider a scenario with 2 producing jobs A and B, writing to 
the same topic with {{AT_LEAST_ONCE}} and a consuming job reading from the 
topic. Both producers write unique, monotonically increasing sequences to the 
same key. Job A writes {{x=a1,a2,a3,a4,a5…}} Job B writes 
{{{}x=b1,b2,b3,b4,b5,...{}}}. With this setup, we can have the following 
sequence:
 # Job A produces x=a5.
 # Job B produces x=b5.
 # Job A produces the duplicate write x= 5.

The consuming job would observe {{x}} going to {{{}a5{}}}, then to {{{}b5{}}}, 
then back {{{}a5{}}}. {{EXACTLY_ONCE}} would prevent this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to