[ 
https://issues.apache.org/jira/browse/KAFKA-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306060#comment-17306060
 ] 

Bruno Cadonna commented on KAFKA-12475:
---------------------------------------

[~guozhang] Good point about bookkeeping.

[~ableegoldman] I think, disabling the changelog for remote storage under EOS, 
would currently break the EOS guarantee. To guarantee EOS with remote storages, 
we need to transactionally flush the data to the remote storage and a way to 
execute the transaction as well as verify if the transaction was successful or 
not within the Kafka transaction. Without this two things in place, we cannot 
abort a transaction when the flush to the remote storage fails. The changelog 
topic under EOS allows us to transactionally save the state. 

> Kafka Streams breaks EOS with remote state stores
> -------------------------------------------------
>
>                 Key: KAFKA-12475
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12475
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>              Labels: needs-kip
>
> Currently in Kafka Streams, exactly-once semantics (EOS) require that the 
> state stores be completely erased and restored from the changelog from 
> scratch in case of an error. This erasure is implemented by closing the state 
> store and then simply wiping out the local state directory. This works fine 
> for the two store implementations provided OOTB, in-memory and rocksdb, but 
> fails when the application includes a custom StateStore based on remote 
> storage, such as Redis. In this case Streams will fail to erase any of the 
> data before reinserting data from the changelog, resulting in possible 
> duplicates and breaking the guarantee of EOS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to