[jira] [Updated] (KAFKA-7626) Possible duplicate message delivery with exactly-once semantics

Noam Berman (JIRA) Wed, 14 Nov 2018 16:45:18 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-7626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Noam Berman updated KAFKA-7626:
-------------------------------
    Attachment: partial.log

> Possible duplicate message delivery with exactly-once semantics
> ---------------------------------------------------------------
>
>                 Key: KAFKA-7626
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7626
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.1.0
>         Environment: google cloud build docker, all brokers, consumers and 
> producers running on the same container (this is a test, not production).
>            Reporter: Noam Berman
>            Priority: Major
>              Labels: exactly-once
>         Attachments: partial.log
>
>
> Hello,
> I've come across an issue with exactly-once processing (running kafka 1.1.0):
> In my test I bring up 3 brokers, and I start sending messages to `topicX`.
> While I'm sending messages, I bring up a few consumers on `topicX` one at a 
> time (all with the same group id) - and they produce the same message to 
> `topicY`. At some point I bring one broker down and up again, to check 
> resiliency to failures.
> Eventually I assert that `topicY` contains exactly the messages sent to 
> `topicX`.
> This usually works as expected, but when running the same test 1000s of times 
> to check for flakiness, some of them act as follows (in this order):
> 1. Consumer `C1` owns partition `p`.
> 1a. Consumers rebalance occurs (because one of the new consumers is starting).
> 1b. Consumer `C1` is revoked and then re-assigned partition `p`.
> 2. One of the 3 brokers starts controlled shutdown.
> 3. Consumer `C1` uses a transactional producer to send a message on offset 
> `o`.
> 4. Consumer `C1` sends offset `o+1` for partition `p` to the transaction.
> 5. Consumer `C1` successfully commits the message.
> 6. Broker controlled shutdown finishes successfully.
> ... a few seconds after...
> 7. Another consumer rebalance occurs, Consumer `C2` is assigned to partition 
> `p`.
> 8. Consumer `C2` polls message on offset `o` for partition `p`.
> This means we do double processing for the message on offset `o`, violating 
> exactly-once semantics.
> So it looks like during broker restart, a commit to the transactional 
> producer gets lost - and because we rebalance after that before another 
> commit happened, we actually poll the same message again, although previously 
> committed.
> The brokers are configured with:
> `transaction.state.log.min.isr=2`
> `transaction.state.log.replication.factor=3`
> `offsets.topic.replication.factor=3`
> The consumer is configured with
> `isolation.level=read_committed`
> The original producer to `topicX` has transactional semantics, and the test 
> shows that it didn't send double messages (using idempodent producer config).
>  
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KAFKA-7626) Possible duplicate message delivery with exactly-once semantics

Reply via email to