[ https://issues.apache.org/jira/browse/KAFKA-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136216#comment-17136216 ]
Guozhang Wang commented on KAFKA-10167: --------------------------------------- I've also thought about whether we need to block on mainConsumer#committed before getting the end-offset, and now I think that is not necessary either since the end-offset only requires that data is flushed --- again, remember we have this single-writer single-reader scenario and when we are in the initialize-changelog-reader phase, we know that the other old producer would not be able to write any more unabortable data to that partition. > Streams EOS-Beta should not try to get end-offsets as read-committed > -------------------------------------------------------------------- > > Key: KAFKA-10167 > URL: https://issues.apache.org/jira/browse/KAFKA-10167 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Priority: Major > > This is a bug discovered with the new EOS protocol (KIP-447), here's the > context: > In Streams when we are assigned with the new active tasks, we would first try > to restore the state from the changelog topic all the way to the log end > offset, and then we can transit from the `restoring` to the `running` state > to start processing the task. > Before KIP-447, the end-offset call is only triggered after we've passed the > synchronization barrier at the txn-coordinator which would guarantee that the > txn-marker has been sent and received (otherwise we would error with > CONCURRENT_TRANSACTIONS and let the producer retry), and when the txn-marker > is received, it also means that the marker has been fully replicated, which > in turn guarantees that the data written before that marker has been fully > replicated. As a result, when we send the list-offset with `read-committed` > flag we are guaranteed that the returned offset == LSO == high-watermark. > After KIP-447 however, we do not fence on the txn-coordinator but on > group-coordinator upon offset-fetch, and the group-coordinator would return > the fetching offset right after it has received the replicated the txn-marker > sent to it. However, since the txn-marker are sent to different brokers in > parallel, and even within the same broker markers of different partitions are > appended / replicated independently as well, so when the fetch-offset request > returns it is NOT guaranteed that the LSO on other data partitions would have > been advanced as well. And hence in that case the `endOffset` call may > returned a smaller offset, causing data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005)